-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run LSTM recognition in multiple threads #4275
base: main
Are you sure you want to change the base?
Conversation
Init time option lstm_num_threads should be used to set the number of LSTM threads
src/ccmain/tesseractclass.cpp
Outdated
@@ -477,7 +481,10 @@ Tesseract::~Tesseract() { | |||
for (auto *lang : sub_langs_) { | |||
delete lang; | |||
} | |||
delete lstm_recognizer_; | |||
for (int i = 0; i < lstm_recognizers_.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
should use the same data type as the return value of size()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto &&r : lstm_recognizers_) delete r;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review comments. I have addressed this now.
lstm_recognizer_ = new LSTMRecognizer(language_data_path_prefix.c_str()); | ||
ASSERT_HOST(lstm_recognizer_->Load(this->params(), lstm_use_matrix ? language : "", mgr)); | ||
for (int i = 0; i < lstm_num_threads; ++i) { | ||
lstm_recognizers_.push_back(new LSTMRecognizer(language_data_path_prefix.c_str())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me.
But maybe it is possible to switch to unique_ptrs here?
Upd.:
Since this is new code std::vector<LSTMRecognizer *> lstm_recognizers_;
, could you try to use uniq ptrs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the existing design pattern of tesseract of using LSTMRecognizer *
and performing new
and delete
for allocation and deallocation.
As per the C++ best practice, we should ideally use shared_ptrs everywhere for accessing LSTMRecognizer, to avoid any dangling ptrs. But that will involve a lot of refactoring, deviating away from the main purpose of this PR. Hence I just followed the existing tesseract practice of using LSTMRecognizer *
.
src/ccmain/control.cpp
Outdated
auto segment_start = words->begin() + segment_size; | ||
for (int i = 1; i < lstm_num_threads; ++i) { | ||
auto segment_end = (i == lstm_num_threads - 1) ? words->end() : segment_start + segment_size; | ||
futures.push_back(std::async(std::launch::async, &Tesseract::RecogWordsSegment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about static thread pool here?
If code is called multiple times, how is std::async performance compared to static thread pool?
Considering async will create a new thread each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
As stweil
has clarified in his latest comment, this multiple lstm threads is not meant for mass production. This feature is probably meant for consumer-end devices running a single page OCR once in a while, but with least possible latency. In such cases thread creation/deletion is not a big overhead. But if we really come across use-cases where thread creation/deletion overhead becomes significant, we could look at replacing this with a thread pool at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default case (lstm_num_threads == 1) must not create a new thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the default case (lstm_num_threads == 1) does not create a new thread. As you see, the loop starts with i = 1
, and hence does not execute when lstm_num_threads == 1
.
Many thanks for this nice contribution. With this pull request users have the choice of using the new argument Maybe we could also extend the command line syntax to have |
Just to clarify this statement: it's only true for the OCR of a single page. For mass production it is still better to run (number of cores) parallel Tesseract processes because then all processing steps use 100 % of the available resources. |
And many thanks to you for reviewing this patiently.
This
When I tested tesseract with a psm of 3(which is the default for |
Totally agreed. This is meant for latency-sensitive real-time applications, with ocr probably running in the consumer's device itself. |
Changed the WERD_RES linked link to use shared pointers instead of raw pointers. This is needed so that even if one thread deletes a WERD_RES object, other thread's which needs to iterate thru them can still access it safely. In terms of LSTM processing, only one threads processes one WERD_RES. This change is needed as all the threads can iterate thru due to single linked list data structure.
@stweil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it is much much worse.
Remove std::async.
I suggest to use previous version as base. |
@egorpugin I am not sure, if I understand your comment here. Could you please elaborate what is "much much worse"? |
You need to provide a very detailed description of:
|
I tested this PR on my Mac (M1 chip). I have a few observations to share:
This behavior differs from Tesseract 5.4.1, which produces correct output for the same inputs in both cases. I hope this feedback is helpful. |
Thanks for trying it out and providing a detailed analysis. Here is my reply.
|
Init time option lstm_num_threads should be used to set the number of LSTM threads. This will ensure that word recognition can run independently in multiple threads, thus effectively utilizing multi-core processors.
Following are my test results for a sample screenshot.
CPU : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
OS : WIndows
Compiler : MSVC 19.38.33130.0 (Installed from Visual Studio 2022)
Model: eng.traineddata from tessfast
PSM: 6
Total time taken for
Recognize
API call, Built without OpenMPWith lstm_num_threads=1, total time taken = 3.95 seconds
With lstm_num_threads=4, total time taken = 1.4 seconds
On the other hand, here are the numbers with OpenMP
OMP_THREAD_LIMIT not set, total time taken = 3.59 seconds
OMP_THREAD_LIMIT=4, total time taken = 3.57 seconds
OMP_THREAD_LIMIT=1, total time taken = 4.19 seconds
As we can observe, this branch with
lstm_num_threads
set as 4, performs way better than the openmp multithreading supported currently. Settinglstm_num_threads
equal to the number of cores in the processor will give the best performance.