Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? #69

Open
2 tasks done
Lyzin opened this issue Dec 26, 2023 · 3 comments
Assignees

Comments

@Lyzin
Copy link

Lyzin commented Dec 26, 2023

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

使用llama.cpp项目先转化为f16
python3 convert-hf-to-gguf.py models/Qwen-1_8B-Chat/

然后推理
./main -m ./models/Qwen-1_8B-Chat/ggml-model-f16.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

但是回答错乱,1.8B是不支持llama.cpp量化吗?
image

同样试了转为int4量化,也是出现回答错乱

期望行为 | Expected Behavior

期望可以正常回答

复现方法 | Steps To Reproduce

下载llama.cpp项目
下载Qwen-1_8B-Chat模型
转化模型为f16精度
再转为int4量化版本推理

推理出现回答错乱看不懂

运行环境 | Environment

- OS: macos
- Python: 3.9
- Transformers:
- PyTorch: 
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

@Lyzin Lyzin changed the title [BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱看不懂 [BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? Dec 26, 2023
@Lyzin
Copy link
Author

Lyzin commented Dec 26, 2023

我重新验证了下,因为llama.cpp在mac os默认开启了metal,然后llama.cpp编译的main推理时默认使用了mac os系统的显卡推理,就会出现回答看不懂的情况,但关闭macos的显卡推理,就回答一切正常,请官方也帮忙看看是不是有这个问题

完整启动命令

关闭mac os显卡推理,添加 -ngl 0参数

./main -m ./models/Qwen-1_8B-Chat/ggml-model-q4_0.gguf -n 512 --color -i -cml -ngl 0 -f prompts/chat-with-qwen.txt
image

@Lyzin Lyzin changed the title [BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? [BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? Dec 26, 2023
@Lyzin Lyzin changed the title [BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? [BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? Dec 26, 2023
@ban-shi-yi-sheng
Copy link

你能转换成功也是nb 我这用llama的转换都不行 那边现在是gguf格式了 这边刚出来怎么qwen.cpp 转换的是ggml格式呢? 能不能无缝转成gguf格式啊 这样就能llama使用了 那边服务端也能运行了

@ban-shi-yi-sheng
Copy link

是我见识浅薄了 python3 convert-hf-to-gguf.py 用这个可以转 转完用q8_0量化 刚开始用确实有几次跟发神经似的 不过现在貌似好了 ....

./main -m /Users/xxxx/AI/Models/Qwen-14B-Chat/ggml-model-f16-q8_0.gguf \                                                          ─╯
--color -i -ngl 1 -c 4096 -t 8 --temp 0.5 --top_k 40 --top_p 0.9 --repeat_penalty 1.1 -f /Users/xxxxx/AI/llama.cpp/prompts/chat-with-qwen.txt -cml
image

@jklj077 jklj077 transferred this issue from QwenLM/Qwen Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants