You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered some challenges when using Deepspeed that we hope to address with your expertise.
During fine-tuning LLM LLama-7b-chat-hf and LLama-13b-chat-hf with multiple GPUs, I observed the following token-per-second speeds: 1 GPU (60 tokens/s), 2 GPUs (178 tokens/s), 3 GPUs (230 tokens/s), and 4 GPUs (300 tokens/s). Surprisingly, the efficiency did not exhibit a proportional increase with the addition of GPUs beyond two. 3 GPUs is not as good as expectation compare with 2 GPUs. Are there any possible technical explanation for this issue?
Under identical conditions using the TRX50 motherboard, we compared the performance of two configurations:
Case 1: NV RTX 4090 x 2 cards
Case 2: AMD Radeon Pro W7900 x 2 cards
Initially, two months ago, the AMD Radeon Pro W7900 outperformed the NV RTX 4090 in terms of speed (tokens/s) for LLama-7b-chat-hf and LLama-13b-chat-hf models. However, in my recent tests, the NV RTX 4090 surpassed the AMD Radeon Pro W7900, both with and without the flash-attn feature enabled.
I seek your insights on these issues. Are there any explanations for these fluctuations in performance? Are certain versions of Deepspeed optimized for specific GPU types, such as the NV RTX 4090 or the AMD W7900?
I also want to ask why flash-attn cannot support for AMD GPUs (Radeon pro W7800, W7900)?
Thank you!
Le
The text was updated successfully, but these errors were encountered:
I have encountered some challenges when using Deepspeed that we hope to address with your expertise.
During fine-tuning LLM LLama-7b-chat-hf and LLama-13b-chat-hf with multiple GPUs, I observed the following token-per-second speeds: 1 GPU (60 tokens/s), 2 GPUs (178 tokens/s), 3 GPUs (230 tokens/s), and 4 GPUs (300 tokens/s). Surprisingly, the efficiency did not exhibit a proportional increase with the addition of GPUs beyond two. 3 GPUs is not as good as expectation compare with 2 GPUs. Are there any possible technical explanation for this issue?
Under identical conditions using the TRX50 motherboard, we compared the performance of two configurations:
Case 1: NV RTX 4090 x 2 cards
Case 2: AMD Radeon Pro W7900 x 2 cards
Initially, two months ago, the AMD Radeon Pro W7900 outperformed the NV RTX 4090 in terms of speed (tokens/s) for LLama-7b-chat-hf and LLama-13b-chat-hf models. However, in my recent tests, the NV RTX 4090 surpassed the AMD Radeon Pro W7900, both with and without the flash-attn feature enabled.
I seek your insights on these issues. Are there any explanations for these fluctuations in performance? Are certain versions of Deepspeed optimized for specific GPU types, such as the NV RTX 4090 or the AMD W7900?
Thank you!
Le
The text was updated successfully, but these errors were encountered: