Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-Llama-3-70B-Instruct running out of memory on 8 A100-40GB #183

Open
whatdhack opened this issue May 3, 2024 · 2 comments
Open

Meta-Llama-3-70B-Instruct running out of memory on 8 A100-40GB #183

whatdhack opened this issue May 3, 2024 · 2 comments

Comments

@whatdhack
Copy link

Describe the bug

Out of memory. Tried to allocate X.XX GiB .....

Minimal reproducible example

I guess any A100 system with 8+ GPUs

python example_chat_completion.py

Output

<Remember to wrap the output in ```triple-quotes blocks```>

Out of memory. Tried to allocate X.XX GiB  .....

Runtime Environment

  • Model: Meta-Llama-3-70B-Instruct
  • Using via huggingface?: no
  • OS: Linux
  • GPU VRAM: 40 GB
  • Number of GPUs: 8
  • GPU Make: Nvidia

Additional context
Is there a way to reduce the memory requirement ? Most obvious trick, reducing batch size, did not prevent OOM.

@whatdhack
Copy link
Author

What is the best way to adapt the 8 checkpoints for A100-80GB/H100 for the 70B model to say 16 A100-40GB ?

@subramen
Copy link
Contributor

Please see this thread: #157 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants