-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536
Comments
支持的 |
请看一下pipeline.md文档关于内存分配的说明 |
我使用vllm跑8K token, 执行如下命令可以正常运行:
lmdeploy是要消耗更多内存么? |
目前码表没有按TP切分,Qwen的码表特别大影响会比较明显。 我看看怎么加一下 |
好的 感谢. |
大佬们么, 8卡 v100 32G,有希望跑qwen110b么 ,这个框架性能确实强,跑72b比vllm强很多, 速度快,token能跑满32K.很期待. |
会支持,不过没那么快,估计2周以后了。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Checklist
Describe the bug
我用0.4.0的版本跑qwen110b的模型一直报错内存不足, --quant-policy 4也不行. 是目前还不支持吗?
我用vllm跑qwen110b PF16, GPTQ-Int4 都是运行正常的.
我的机器是8卡 v100 32G
Reproduction
lmdeploy serve api_server /Qwen1.5-110B-Chat --server-port 23333 --model-name qwen110b --tp 8 --log-level INFO --quant-policy 4
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: