-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 3090量化llama3 70b爆显存 #1562
Comments
可以 calib_seqlen 设小一些试试 |
非常感谢~!lmdeploy lite auto_awq /Meta-Llama-3-70B-Instruct --work-dir /Meta-Llama-3-70B-Instruct-4bit --calib-seqlen 1024成功编译。顺便请教下,awq编译的w4a16精度下降大概多少呢?lmdeploy有工具可以直接测试ppl吗? |
应该就是量化失败了。目前 lmdeploy 的量化没有搜 ratio 的能力,可以用 auto_awq,或者等 lmdeploy 下个版本试试。 |
@AllentDan 基于你最近的工作,量化下llama3-70b,结果正常么? |
之前的版本就可以量化 llava3_llama70b,对话正常。可能是xtuner 微调过权重变了。目前没有过 70b 搜 scale, 很耗时 |
这里用的就是auto_awq |
没有做微调,用的是原生权重。请问这里auto_awq的scale是预先生成的吗? |
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response. |
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now. |
Checklist
Describe the bug
在3090显卡,24g显存,使用lmdeploy lite awq量化llama3 70b在79层爆显存,按照建议增加了PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
在nvidia-smi看只使用了第一张卡,可以使用多张卡吗?
Reproduction
lmdeploy lite auto_awq /Meta-Llama-3-70B-Instruct --work-dir /Meta-Llama-3-70B-Instruct-4bit
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: