-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite recursion for tesseract --list-langs
with conda-forge binary
#4230
Comments
Can you please exactly describe how to reproduce the problem? I do not have macOS, but on Windows, I got this result: > set TESSDATA_PREFIX=""
> tesseract --list-langs
List of available languages in """/" (0):
>set TESSDATA_PREFIX=
>tesseract --list-langs
List of available languages in "./" (1):
Downloads/dotslayer On the Linux I got this result: $ export TESSDATA_PREFIX=""
$ tesseract --list-langs
List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (2):
eng
osd |
On macOS, your Linux commands reproduce the infinite loop |
@jonashaag : how did you installed tesseract? @stweil : Can you have a look at this? |
I don't get an infinite loop on macOS (installed with Homebrew). An empty
|
@jonashaag, Can you see the arguments of the |
Aha, interesting, the Homebrew version works fine. The conda-forge version that I've been using doesn't. Output from macOS's
|
There are no symlinks in
|
addAvailableLanguages
addAvailableLanguages
with conda-forge binary for macOS
Conda links: I am still searching for a description of the build process which was used for the Conda Tesseract package. |
Thanks. I could build it locally, and the result works fine. It looks like the error is in In my tests with the conda-forge binary I get some more strange results:
I cannot explain that strange results with the Tesseract code, and debugging without debug symbols is rather time consuming. Therefore I suggest to report the issue to conda-forge. |
I think it is related to Conda's prefix replacement. The prefix ends up being empty or something like that. I will build a package with debug symbols/prints and check what's going on. |
That's a good hint which helped me find the root of the problem. Patching TESSDATA_PREFIX in the compiled library does currently not work for Tesseract because that string is assigned to This is not restricted to installations on macOS. Installations with Conda on Linux will have the same problem. I think there is a simple fix which I will try later. |
Conda installations patch TESSDATA_PREFIX in the binary. That does not work for std::string because the length won't be patched, so use a normal C string which can be patched. Simplify also the code which checks the last character of datadir. Signed-off-by: Stefan Weil <[email protected]>
Pull request #4239 should fix this issue. |
addAvailableLanguages
with conda-forge binary for macOStesseract --list-langs
with conda-forge binary for macOS
tesseract --list-langs
with conda-forge binary for macOStesseract --list-langs
with conda-forge binary
Fantastic! Thanks a million! |
Conda installations patch TESSDATA_PREFIX in the binary. That does not work for std::string because the length won't be patched, so use a normal C string which can be patched. Simplify also the code which checks the last character of datadir. Signed-off-by: Stefan Weil <[email protected]>
@jonashaag, I am sorry, but my patch does not work. I'll try a different fix tomorrow. |
…#4230) Signed-off-by: Stefan Weil <[email protected]>
Pull request #4240 has a different fix which I now tested successfully on Linux and on macOS. |
Signed-off-by: Stefan Weil <[email protected]>
Current Behavior
tesseract --list-langs
goes into infinite loop on macOS ifTESSDATA_PREFIX
is empty.macOS Instruments shows infinite recursion in
addAvailableLanguages
, and a LOT ofstat64
calls (multiple 10k per second).Expected Behavior
Should not go into infinite recursion
Suggested Fix
No response
tesseract -v
tesseract 5.3.4
leptonica-1.83.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.13 : libwebp 1.3.2 : libopenjp2 2.5.2
Found NEON
Operating System
macOS 14 Sonoma
Other Operating System
No response
uname -a
Darwin ... 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:41 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8103 arm64
Compiler
From conda-forge
CPU
No response
Virtualization / Containers
None
Other Information
No response
The text was updated successfully, but these errors were encountered: