Support multi-GPU training #676

srogatch · 2023-05-25T11:35:09Z

I couldn't so far find a way to train on multiple GPUs within the same computer. If it exists, please, describe the way to do it.

isaacmg · 2023-06-08T19:46:26Z

Hello sorry for the delay. We do currently have Docker containers which you can use with Wandb to perform a distributed hyper-parameter sweep. IMO multi-GPU for a single model isn't much benefit it is very hard to saturate even a single GPU unless you have huge batch sizes. Bottleneck generally comes from things.

srogatch · 2023-06-08T20:09:51Z

I have batch size 64, history length 1440, lookahead 480, and 2 million points in the time series, each consisting of 4 values. A single GPU is saturated 97-100% currently, and judging from power consumption, it's indeed fully saturated and I can benefit from multiple GPUs.

isaacmg · 2023-06-08T22:51:16Z

Interesting, I've never really run into that problem before. Let me look into it. FF is built on top of PyTorch of course so it is hopefully it is something I could reasonably add quickly. Out of the box as of now though we don't support it as we mainly use model.to()

srogatch · 2023-06-09T00:13:42Z

Yes, we need to add DistributedDataParallel object, multi-processing launch, get the local rank of each process, and use it as the device parameter in model.to(). I planned to add this myself, but unfortunately, afterwards I had to postpone this project because I got some higher priorities.

isaacmg added the enhancement New feature or request label Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-GPU training #676

Support multi-GPU training #676

srogatch commented May 25, 2023

isaacmg commented Jun 8, 2023 •

edited

srogatch commented Jun 8, 2023

isaacmg commented Jun 8, 2023

srogatch commented Jun 9, 2023

Support multi-GPU training #676

Support multi-GPU training #676

Comments

srogatch commented May 25, 2023

isaacmg commented Jun 8, 2023 • edited

srogatch commented Jun 8, 2023

isaacmg commented Jun 8, 2023

srogatch commented Jun 9, 2023

isaacmg commented Jun 8, 2023 •

edited