Best Effort You Can Do to Increase Performance of Donut 🍩
for non-GPU users, you may need to read this.
From this article, I delve further into optimizing the performance of the Donut 🍩 model on a CPU-only machine. While AI models generally perform better with GPU acceleration, my recent experiments demonstrate that running the Donut model solely on a CPU is feasible, albeit with reduced performance compared to a GPU-enabled setup.
From my previous article, I noted that extracting information from a single image took approximately 3–4 seconds using a CPU. These tests were conducted with only one user operating the system at a time.
However, challenges arose when I conducted a performance test using LoadRunner with 30 concurrent users. Under this load, the Donut 🍩 model was unable to handle the requests effectively, as evidenced by numerous timeout errors (see image below).
To address this, I referenced this GitHub discussion, which suggested increasing the number of CPU threads to improve utilization. Implementing this approach, I modified my code by adding a threading configuration, ensuring that the number of threads (x) does not exceed the available CPUs. In my case, with 24 CPUs, the maximum value for x was 24.
torch.set_num_threads(x) #Replace x with number
Using Postman, I tested various thread configurations to identify the optimal setup for the model. The results revealed improved performance when using 12 threads.
Note: Interop threads is optional since my final results show that there’s no significant impact when I add it.
I then re-ran the performance test with 30 concurrent users.
Interestingly, the outcome differed from the Postman results. With 12 threads, the Donut model still struggled under load. However, when I reduced the thread count to 7, the model successfully handled the concurrent requests.
Conclusion
Based on these findings, the Donut 🍩 model can effectively operate on a CPU-only machine with 24 cores, supporting up to 30 concurrent users when configured to use 7 threads. I plan to continue my experiments by conducting similar performance tests on a GPU-equipped machine. I look forward to sharing those results soon — stay tuned!