-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Optimization with Num_Workers not working #2354
Comments
Oh and the newer gpu and the much weaker one trains the same length of time so somewhere is a bottle neck. |
Hi @Laenita, Would you mind sharing the value of the parameters? So that we can have an idea of the number of parameters/size of the model. Is the GPU acceleration being used at 1% for both the old and the new devices? The |
Hi @madtoinou Of course here are my parameters for my model I hope this helps: And yes, both the old and newer (and much faster) GPU's are both only showing 1% utilisation and also training the same time on the same model, indicating that something is wrong and heavy under-utilising. But also the num_loader_workers=1 is not working at all for me, takes more than an hour with num_loader_workers >0. Thanks for your assistance! |
Yes, I have the same problem: I am told that num_loader_workers is not a legit parameter. |
Hi @igorrivin & @Laenita, As mentioned in another tread, the PR ##2295 is adding support for those arguments. Maybe try installing this branch/copy the changes and see if it solves the bottleneck? |
Hi @madtoinou I have copied the changes from PR ##2295 |
Which sanity checking are you referring to? |
Hi @madtoinou, the best explanation I can show is this PNG where the model first goes into a Sanity Checking Phase before starting training: |
I am not very experienced, but I loveee this package. However, my GPU acceleration seems to only utilize about 1% of my GPU. Increasing the batch size made my predictions far less accurate. And I read that increasing num_loader_workers will work, but I get an log stating that I should set persistent_workers =True in the val_dataloader package but I know Darts does not work this way. And then the model runs 5 times longer. Can you please assist? I just got a better GPU to optimize my training time but I can't get it to use more of the GPU? Here is my model for reference:
The text was updated successfully, but these errors were encountered: