Best Effort You Can Do to Increase Performance of Donut 🍩

Rizkynindra
3 min readJan 5, 2025

for non-GPU users, you may need to read this.

From this article, I delve further into optimizing the performance of the Donut 🍩 model on a CPU-only machine. While AI models generally perform better with GPU acceleration, my recent experiments demonstrate that running the Donut model solely on a CPU is feasible, albeit with reduced performance compared to a GPU-enabled setup.

From my previous article, I noted that extracting information from a single image took approximately 3–4 seconds using a CPU. These tests were conducted with only one user operating the system at a time.

CPU Specifications.

However, challenges arose when I conducted a performance test using LoadRunner with 30 concurrent users. Under this load, the Donut 🍩 model was unable to handle the requests effectively, as evidenced by numerous timeout errors (see image below).

Timeout error

To address this, I referenced this GitHub discussion, which suggested increasing the number of CPU threads to improve utilization. Implementing this approach, I modified my code by adding a threading configuration, ensuring that the number of threads (x) does not exceed the available CPUs. In my case, with 24 CPUs, the maximum value for x was 24.

torch.set_num_threads(x) #Replace x with number

Using Postman, I tested various thread configurations to identify the optimal setup for the model. The results revealed improved performance when using 12 threads.

best threads using postman

Note: Interop threads is optional since my final results show that there’s no significant impact when I add it.

I then re-ran the performance test with 30 concurrent users.

best threads for 30 concurrent users

Interestingly, the outcome differed from the Postman results. With 12 threads, the Donut model still struggled under load. However, when I reduced the thread count to 7, the model successfully handled the concurrent requests.

Conclusion

Based on these findings, the Donut 🍩 model can effectively operate on a CPU-only machine with 24 cores, supporting up to 30 concurrent users when configured to use 7 threads. I plan to continue my experiments by conducting similar performance tests on a GPU-equipped machine. I look forward to sharing those results soon — stay tuned!

--

--

Rizkynindra
Rizkynindra

No responses yet