New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: why hf is better than vllm when using benchmark throughput #4702
Comments
Hi @yuki252111 when I tried this with a more realistic LLM size vLLM:
HF:
|
i have same issue in here, do you have any update? |
@AlexBlack2202
so I did two experiments. Experiment 1 batch_size=20this experiment will make continuous batching more effective Experiment 2 llama-7b-chat-hf batch_size=20
This result seems to make sense... |
thank you very much |
When I run benchmark on H800, the results are confusing. Why hf is better than vllm? Is anything wrong when I run the script?
Throughput: 59.50 requests/s, 15231.62 tokens/s
Throughput: 108.34 requests/s, 27736.31 tokens/s
The text was updated successfully, but these errors were encountered: