New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

这个需要什么配置合适？用一张A100 显卡跑的7B模型，80G显存用了10G，回答case中的怎么去北京要60秒才返回结果 #310

Open

MetaRunning opened this issue Apr 3, 2024 · 2 comments

MetaRunning commented Apr 3, 2024

RT

MetaRunning closed this as completed

MetaRunning reopened this

Collaborator

Rayrtfr commented Apr 22, 2024

RT

使用加速方法，简单的可以试试vllm：https://github.com/LlamaFamily/Llama-Chinese/tree/main/inference-speed/GPU/vllm_example

Author

MetaRunning commented Apr 22, 2024

RT

使用加速方法，简单的可以试试vllm：https://github.com/LlamaFamily/Llama-Chinese/tree/main/inference-speed/GPU/vllm_example

好的，多谢。我试下看下效果。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment