-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point exception with tessdata models since version 5.4.0 #4257
Comments
I get no crash with a debug build on Debian GNU Linux. |
This patch fixes it:
I still have to examine why the exception does not occur on Linux. |
This comment was marked as outdated.
This comment was marked as outdated.
Arch Linux installs model files from tessdata, Debian installs models files from tessdata_fast. With a tessdata model I could reproduce the FP overflow in NormEvidenceOf which is called with FLT_MAX and tries to calculate the square of this value. I wonder why none of our continuous integration tests detected this regression. Obviously the tests must be improved. |
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
The new release 5.4.1 includes the fix, so this issue can be closed as soon as the fix was confirmed with an update package for Arch Linux. |
I compiled the package; it seems to be working fine: no more floating‐point exceptions. |
Current Behavior
I use OCRmyPDF on Archlinux. The program has been crashing since yesterday after tesseract was updated from version 5.3.4-2 to 5.4.0-1. After a downgrade, tesseract works as expected with the same image.
I executed the ocrmypdf commands manually:
$ gs -dQUIET -dSAFER -dBATCH -dNOPAUSE -dInterpolateControl=-1 -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -r200.161797x200.161797 -dPDFSTOPONERROR -o image.png -sstdout=%stderr -dAutoRotatePages=/None -f doc20240608121758.pdf
$ tesseract -l deu image.png 000001_ocr_hocr hocr txt
[1] 9771 floating point exception (core dumped)
$ pacman -U tesseract-5.3.4-2-x86_64.pkg.tar.zst
$ tesseract -l deu image.png 000001_ocr_hocr hocr txt
(works fine)
For data protection reasons, I recreated a document that caused the program to crash:
Expected Behavior
No response
Suggested Fix
No response
tesseract -v
tesseract 5.4.0
leptonica-1.84.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.2) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
Found libcurl/8.8.0 OpenSSL/3.3.1 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0
Operating System
No response
Other Operating System
Archlinux
uname -a
Linux pc 6.9.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 31 May 2024 15:14:45 +0000 x86_64 GNU/Linux
Compiler
No response
CPU
Intel i7-8650U
Virtualization / Containers
No response
Other Information
$ gdb --args tesseract -l deu image.png 000001_ocr_hocr hocr txt
(gdb) run
Starting program: /usr/bin/tesseract -l deu image.png 000001_ocr_hocr hocr txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff20006c0 (LWP 10820)]
[New Thread 0x7ffff16006c0 (LWP 10821)]
[New Thread 0x7ffff0c006c0 (LWP 10822)]
Thread 1 "tesseract" received signal SIGFPE, Arithmetic exception.
0x00007ffff7d8afa4 in tesseract::Classify::ComputeNormMatch(int, tesseract::FEATURE_STRUCT const&, bool) () from /usr/lib/libtesseract.so.5
(gdb) bt
#0 0x00007ffff7d8afa4 in tesseract::Classify::ComputeNormMatch(int, tesseract::FEATURE_STRUCT const&, bool) () from /usr/lib/libtesseract.so.5
#1 0x00007ffff7d806c5 in tesseract::Classify::ComputeIntCharNormArray(tesseract::FEATURE_STRUCT const&, unsigned char*) () from /usr/lib/libtesseract.so.5
#2 0x00007ffff7d705ad in tesseract::Classify::ComputeCharNormArrays(tesseract::FEATURE_STRUCT*, tesseract::INT_TEMPLATES_STRUCT*, unsigned char*, unsigned char*) () from /usr/lib/libtesseract.so.5
#3 0x00007ffff7d709fe in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, std::vector<tesseract::UnicharRating, std::allocatortesseract::UnicharRating >) ()
from /usr/lib/libtesseract.so.5
#4 0x00007ffff7d9312b in tesseract::TessClassifier::UnicharClassifySample(tesseract::TrainingSample const&, tesseract::Image, int, int, std::vector<tesseract::UnicharRating, std::allocatortesseract::UnicharRating >) () from /usr/lib/libtesseract.so.5
#5 0x00007ffff7d6e575 in tesseract::Classify::CharNormClassifier(tesseract::TBLOB*, tesseract::TrainingSample const&, tesseract::ADAPT_RESULTS*) () from /usr/lib/libtesseract.so.5
#6 0x00007ffff7d73e76 in tesseract::Classify::DoAdaptiveMatch(tesseract::TBLOB*, tesseract::ADAPT_RESULTS*) () from /usr/lib/libtesseract.so.5
#7 0x00007ffff7d6c5a3 in tesseract::Classify::AdaptiveClassifier(tesseract::TBLOB*, tesseract::BLOB_CHOICE_LIST*) () from /usr/lib/libtesseract.so.5
#8 0x00007ffff7e4d1a4 in tesseract::Wordrec::call_matcher(tesseract::TBLOB*) () from /usr/lib/libtesseract.so.5
#9 0x00007ffff7e5aaeb in tesseract::Wordrec::classify_blob(tesseract::TBLOB*, char const*, tesseract::ScrollView::Color, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#10 0x00007ffff7e5ac41 in tesseract::Wordrec::classify_piece(std::vector<tesseract::SEAM*, std::allocatortesseract::SEAM* > const&, short, short, char const*, tesseract::TWERD*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#11 0x00007ffff7e4b1d4 in tesseract::Wordrec::chop_word_main(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#12 0x00007ffff7e4b6f2 in tesseract::Wordrec::cc_recog(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#13 0x00007ffff7d273f9 in tesseract::Tesseract::recog_word_recursive(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#14 0x00007ffff7d2854b in tesseract::Tesseract::recog_word(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#15 0x00007ffff7d28927 in tesseract::Tesseract::tess_segment_pass_n(int, tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#16 0x00007ffff7cda5a2 in tesseract::Tesseract::match_word_pass_n(int, tesseract::WERD_RES*, tesseract::ROW*, tesseract::BLOCK*) () from /usr/lib/libtesseract.so.5
#17 0x00007ffff7ce1102 in tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES) () from /usr/lib/libtesseract.so.5
#18 0x00007ffff7cd0c51 in tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::)(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES), bool, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES) () from /usr/lib/libtesseract.so.5
#19 0x00007ffff7cd1ac5 in tesseract::Tesseract::classify_word_and_language(int, tesseract::PAGE_RES_IT*, tesseract::WordData*) () from /usr/lib/libtesseract.so.5
#20 0x00007ffff7cd573d in tesseract::Tesseract::RecogAllWordsPassN(int, tesseract::ETEXT_DESC*, tesseract::PAGE_RES_IT*, std::vector<tesseract::WordData, std::allocatortesseract::WordData >) ()
from /usr/lib/libtesseract.so.5
#21 0x00007ffff7cd5ee5 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES, tesseract::ETEXT_DESC*, tesseract::TBOX const*, char const*, int) () from /usr/lib/libtesseract.so.5
#22 0x00007ffff7c9a23d in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) () from /usr/lib/libtesseract.so.5
#23 0x00007ffff7c9d963 in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#24 0x00007ffff7c9ef88 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#25 0x00007ffff7c9f1b4 in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#26 0x0000555555558797 in ?? ()
#27 0x00007ffff714ec88 in ?? () from /usr/lib/libc.so.6
#28 0x00007ffff714ed4c in __libc_start_main () from /usr/lib/libc.so.6
#29 0x00005555555598b5 in ?? ()
The text was updated successfully, but these errors were encountered: