You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
4 A100(40G) can not run sucess. This should be a memory issue. What parameters should I adjust to run 72b model successfully?
In 2 A100, I set --max-batch-prefill-tokens and all four para to 1, still cannot make it.
RuntimeError: Not enough memory to handle 300 prefill tokens. You need to decrease --max-batch-prefill-tokens
2024-04-04T14:14:39.297825Z ERROR warmup{max_input_length=200 max_prefill_tokens=300 max_total_tokens=1024}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: Not enough memory to handle 300 prefill tokens. You need to decrease --max-batch-prefill-tokens
Expected behavior
none
The text was updated successfully, but these errors were encountered:
KrisWongz
changed the title
Need some help.
Need some help. " You need to decrease --max-batch-prefill-tokens."
Apr 5, 2024
System Info
latest
Information
Tasks
Reproduction
4 A100(40G) can not run sucess. This should be a memory issue. What parameters should I adjust to run 72b model successfully?
In 2 A100, I set --max-batch-prefill-tokens and all four para to 1, still cannot make it.
Expected behavior
none
The text was updated successfully, but these errors were encountered: