Do We Really Need GPU to Run ‘Donut’ 🍩?
I investigate my own model and here is what I found…
In my previous article (Extract Information using ‘Donut’ 🍩. Will it be the OCR Killer? 🔪 | by Rizkynindra | Medium), I talk about Document Understanding Transformers (Donut) where the result can beat the accuracy of TesseractOCR for extract information from Indonesian ID Card. Then, I realize that I didn’t notice the performance.
I run the model on my local PC with 32 GB of RAM and 12th Gen Intel Core i9 (24 CPU) processors. with single image, it takes around 3–4 seconds to extract the information. Then, when I move the model to the ubuntu server with the specifications that I don’t really know, the performance is getting slow while it got 7–9 seconds to extract one image. Not only single image, but I also run the experiment with 89 images and from this I realize that the result is totally different.
When I run the model with the 89 samples on my local PC it takes about 5–6 minutes to extract while on the server it takes 10 minutes. It will be getting bad when in the future, the model is used in public. Can you imagine how much time that we need if we have 500++ concurrent users running the model? I think it’s better to forget it #haha.
In the middle of my confusion, I remember that my PC have 2 Nvidia Geforce RTX 3090 Ti while it’s idle since my local PC use Windows Operating System. FYI, if you want to use your GPU in Windows, you need to install WSL first. So, I move my code to the WSL directory and from this, the magic happen~
First, I re-run the single image case using the GPU and it takes only 0.5–1 second. Second, I re-run the 89 samples and finished in 25 seconds. It just outperforms both my own CPU and server CPU.
So, if you want to run the ‘Donut’ 🍩model, I recommended you use GPU. It will save your life #haha.