Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization (q4_k_m gguf) failed for Phi-3 #413

Open
Li-Yanzhi opened this issue May 2, 2024 · 10 comments
Open

Quantization (q4_k_m gguf) failed for Phi-3 #413

Li-Yanzhi opened this issue May 2, 2024 · 10 comments
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster

Comments

@Li-Yanzhi
Copy link

When run Alpaca + Phi-3 3.8b full example.ipynb in https://colab.research.google.com/drive/1NvkBmkHfucGO3Ve9s1NKZvMNlw5p83ym?usp=sharing, in last step to save quantization model:

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

the error will occur:

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 5.96 out of 12.67 RAM for saving.
100%|██████████| 32/32 [00:01<00:00, 19.69it/s]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Unsloth: Saving model/pytorch_model-00001-of-00002.bin...
Unsloth: Saving model/pytorch_model-00002-of-00002.bin...
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp will take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GUUF 16bits will take 3 minutes.
\        /    [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
 "-____-"     In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at model into f16 GGUF format.
The output location will be ./model-unsloth.F16.gguf
This will take 3 minutes...
Loading model file model/pytorch_model-00001-of-00002.bin
Loading model file model/pytorch_model-00001-of-00002.bin
Loading model file model/pytorch_model-00002-of-00002.bin
params = Params(n_vocab=32064, n_embd=3072, n_layer=32, n_ctx=4096, n_ff=8192, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('model'))
Loaded vocab file PosixPath('model/tokenizer.model'), type 'spm'
Vocab info: <SentencePieceVocab with 32000 base tokens and 11 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 32000, 'unk': 0, 'pad': 32000}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight                        -> token_embd.weight                        | F16    | [32064, 3072]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | F16    | [3072, 3072]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | F16    | [3072]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | F16    | [3072]
model.layers.1.self_attn.q_proj.weight           -> blk.1.attn_q.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.k_proj.weight           -> blk.1.attn_k.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.v_proj.weight           -> blk.1.attn_v.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.o_proj.weight           -> blk.1.attn_output.weight                 | F16    | [3072, 3072]
model.layers.1.mlp.gate_proj.weight              -> blk.1.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.1.mlp.up_proj.weight                -> blk.1.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.1.mlp.down_proj.weight              -> blk.1.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.1.input_layernorm.weight            -> blk.1.attn_norm.weight                   | F16    | [3072]
model.layers.1.post_attention_layernorm.weight   -> blk.1.ffn_norm.weight                    | F16    | [3072]
model.layers.2.self_attn.q_proj.weight           -> blk.2.attn_q.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.k_proj.weight           -> blk.2.attn_k.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.v_proj.weight           -> blk.2.attn_v.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.o_proj.weight           -> blk.2.attn_output.weight                 | F16    | [3072, 3072]
model.layers.2.mlp.gate_proj.weight              -> blk.2.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.2.mlp.up_proj.weight                -> blk.2.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.2.mlp.down_proj.weight              -> blk.2.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.2.input_layernorm.weight            -> blk.2.attn_norm.weight                   | F16    | [3072]
model.layers.2.post_attention_layernorm.weight   -> blk.2.ffn_norm.weight                    | F16    | [3072]
model.layers.3.self_attn.q_proj.weight           -> blk.3.attn_q.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.k_proj.weight           -> blk.3.attn_k.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.v_proj.weight           -> blk.3.attn_v.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.o_proj.weight           -> blk.3.attn_output.weight                 | F16    | [3072, 3072]
model.layers.3.mlp.gate_proj.weight              -> blk.3.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.3.mlp.up_proj.weight                -> blk.3.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.3.mlp.down_proj.weight              -> blk.3.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.3.input_layernorm.weight            -> blk.3.attn_norm.weight                   | F16    | [3072]
model.layers.3.post_attention_layernorm.weight   -> blk.3.ffn_norm.weight                    | F16    | [3072]
model.layers.4.self_attn.q_proj.weight           -> blk.4.attn_q.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.k_proj.weight           -> blk.4.attn_k.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.v_proj.weight           -> blk.4.attn_v.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.o_proj.weight           -> blk.4.attn_output.weight                 | F16    | [3072, 3072]
model.layers.4.mlp.gate_proj.weight              -> blk.4.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.4.mlp.up_proj.weight                -> blk.4.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.4.mlp.down_proj.weight              -> blk.4.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.4.input_layernorm.weight            -> blk.4.attn_norm.weight                   | F16    | [3072]
model.layers.4.post_attention_layernorm.weight   -> blk.4.ffn_norm.weight                    | F16    | [3072]
model.layers.5.self_attn.q_proj.weight           -> blk.5.attn_q.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.k_proj.weight           -> blk.5.attn_k.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.v_proj.weight           -> blk.5.attn_v.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.o_proj.weight           -> blk.5.attn_output.weight                 | F16    | [3072, 3072]
model.layers.5.mlp.gate_proj.weight              -> blk.5.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.5.mlp.up_proj.weight                -> blk.5.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.5.mlp.down_proj.weight              -> blk.5.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.5.input_layernorm.weight            -> blk.5.attn_norm.weight                   | F16    | [3072]
model.layers.5.post_attention_layernorm.weight   -> blk.5.ffn_norm.weight                    | F16    | [3072]
model.layers.6.self_attn.q_proj.weight           -> blk.6.attn_q.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.k_proj.weight           -> blk.6.attn_k.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.v_proj.weight           -> blk.6.attn_v.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.o_proj.weight           -> blk.6.attn_output.weight                 | F16    | [3072, 3072]
model.layers.6.mlp.gate_proj.weight              -> blk.6.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.6.mlp.up_proj.weight                -> blk.6.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.6.mlp.down_proj.weight              -> blk.6.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.6.input_layernorm.weight            -> blk.6.attn_norm.weight                   | F16    | [3072]
model.layers.6.post_attention_layernorm.weight   -> blk.6.ffn_norm.weight                    | F16    | [3072]
model.layers.7.self_attn.q_proj.weight           -> blk.7.attn_q.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.k_proj.weight           -> blk.7.attn_k.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.v_proj.weight           -> blk.7.attn_v.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.o_proj.weight           -> blk.7.attn_output.weight                 | F16    | [3072, 3072]
model.layers.7.mlp.gate_proj.weight              -> blk.7.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.7.mlp.up_proj.weight                -> blk.7.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.7.mlp.down_proj.weight              -> blk.7.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.7.input_layernorm.weight            -> blk.7.attn_norm.weight                   | F16    | [3072]
model.layers.7.post_attention_layernorm.weight   -> blk.7.ffn_norm.weight                    | F16    | [3072]
model.layers.8.self_attn.q_proj.weight           -> blk.8.attn_q.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.k_proj.weight           -> blk.8.attn_k.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.v_proj.weight           -> blk.8.attn_v.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.o_proj.weight           -> blk.8.attn_output.weight                 | F16    | [3072, 3072]
model.layers.8.mlp.gate_proj.weight              -> blk.8.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.8.mlp.up_proj.weight                -> blk.8.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.8.mlp.down_proj.weight              -> blk.8.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.8.input_layernorm.weight            -> blk.8.attn_norm.weight                   | F16    | [3072]
model.layers.8.post_attention_layernorm.weight   -> blk.8.ffn_norm.weight                    | F16    | [3072]
model.layers.9.self_attn.q_proj.weight           -> blk.9.attn_q.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.k_proj.weight           -> blk.9.attn_k.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.v_proj.weight           -> blk.9.attn_v.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.o_proj.weight           -> blk.9.attn_output.weight                 | F16    | [3072, 3072]
model.layers.9.mlp.gate_proj.weight              -> blk.9.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.9.mlp.up_proj.weight                -> blk.9.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.9.mlp.down_proj.weight              -> blk.9.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.9.input_layernorm.weight            -> blk.9.attn_norm.weight                   | F16    | [3072]
model.layers.9.post_attention_layernorm.weight   -> blk.9.ffn_norm.weight                    | F16    | [3072]
model.layers.10.self_attn.q_proj.weight          -> blk.10.attn_q.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.k_proj.weight          -> blk.10.attn_k.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.v_proj.weight          -> blk.10.attn_v.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.o_proj.weight          -> blk.10.attn_output.weight                | F16    | [3072, 3072]
model.layers.10.mlp.gate_proj.weight             -> blk.10.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.10.mlp.up_proj.weight               -> blk.10.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.10.mlp.down_proj.weight             -> blk.10.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.10.input_layernorm.weight           -> blk.10.attn_norm.weight                  | F16    | [3072]
model.layers.10.post_attention_layernorm.weight  -> blk.10.ffn_norm.weight                   | F16    | [3072]
model.layers.11.self_attn.q_proj.weight          -> blk.11.attn_q.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.k_proj.weight          -> blk.11.attn_k.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.v_proj.weight          -> blk.11.attn_v.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.o_proj.weight          -> blk.11.attn_output.weight                | F16    | [3072, 3072]
model.layers.11.mlp.gate_proj.weight             -> blk.11.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.11.mlp.up_proj.weight               -> blk.11.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.11.mlp.down_proj.weight             -> blk.11.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.11.input_layernorm.weight           -> blk.11.attn_norm.weight                  | F16    | [3072]
model.layers.11.post_attention_layernorm.weight  -> blk.11.ffn_norm.weight                   | F16    | [3072]
model.layers.12.self_attn.q_proj.weight          -> blk.12.attn_q.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.k_proj.weight          -> blk.12.attn_k.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.v_proj.weight          -> blk.12.attn_v.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.o_proj.weight          -> blk.12.attn_output.weight                | F16    | [3072, 3072]
model.layers.12.mlp.gate_proj.weight             -> blk.12.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.12.mlp.up_proj.weight               -> blk.12.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.12.mlp.down_proj.weight             -> blk.12.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.12.input_layernorm.weight           -> blk.12.attn_norm.weight                  | F16    | [3072]
model.layers.12.post_attention_layernorm.weight  -> blk.12.ffn_norm.weight                   | F16    | [3072]
model.layers.13.self_attn.q_proj.weight          -> blk.13.attn_q.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.k_proj.weight          -> blk.13.attn_k.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.v_proj.weight          -> blk.13.attn_v.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.o_proj.weight          -> blk.13.attn_output.weight                | F16    | [3072, 3072]
model.layers.13.mlp.gate_proj.weight             -> blk.13.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.13.mlp.up_proj.weight               -> blk.13.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.13.mlp.down_proj.weight             -> blk.13.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.13.input_layernorm.weight           -> blk.13.attn_norm.weight                  | F16    | [3072]
model.layers.13.post_attention_layernorm.weight  -> blk.13.ffn_norm.weight                   | F16    | [3072]
model.layers.14.self_attn.q_proj.weight          -> blk.14.attn_q.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.k_proj.weight          -> blk.14.attn_k.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.v_proj.weight          -> blk.14.attn_v.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.o_proj.weight          -> blk.14.attn_output.weight                | F16    | [3072, 3072]
model.layers.14.mlp.gate_proj.weight             -> blk.14.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.14.mlp.up_proj.weight               -> blk.14.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.14.mlp.down_proj.weight             -> blk.14.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.14.input_layernorm.weight           -> blk.14.attn_norm.weight                  | F16    | [3072]
model.layers.14.post_attention_layernorm.weight  -> blk.14.ffn_norm.weight                   | F16    | [3072]
model.layers.15.self_attn.q_proj.weight          -> blk.15.attn_q.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.k_proj.weight          -> blk.15.attn_k.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.v_proj.weight          -> blk.15.attn_v.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.o_proj.weight          -> blk.15.attn_output.weight                | F16    | [3072, 3072]
model.layers.15.mlp.gate_proj.weight             -> blk.15.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.15.mlp.up_proj.weight               -> blk.15.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.15.mlp.down_proj.weight             -> blk.15.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.15.input_layernorm.weight           -> blk.15.attn_norm.weight                  | F16    | [3072]
model.layers.15.post_attention_layernorm.weight  -> blk.15.ffn_norm.weight                   | F16    | [3072]
model.layers.16.self_attn.q_proj.weight          -> blk.16.attn_q.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.k_proj.weight          -> blk.16.attn_k.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.v_proj.weight          -> blk.16.attn_v.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.o_proj.weight          -> blk.16.attn_output.weight                | F16    | [3072, 3072]
model.layers.16.mlp.gate_proj.weight             -> blk.16.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.16.mlp.up_proj.weight               -> blk.16.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.16.mlp.down_proj.weight             -> blk.16.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.16.input_layernorm.weight           -> blk.16.attn_norm.weight                  | F16    | [3072]
model.layers.16.post_attention_layernorm.weight  -> blk.16.ffn_norm.weight                   | F16    | [3072]
model.layers.17.self_attn.q_proj.weight          -> blk.17.attn_q.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.k_proj.weight          -> blk.17.attn_k.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.v_proj.weight          -> blk.17.attn_v.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.o_proj.weight          -> blk.17.attn_output.weight                | F16    | [3072, 3072]
model.layers.17.mlp.gate_proj.weight             -> blk.17.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.17.mlp.up_proj.weight               -> blk.17.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.17.mlp.down_proj.weight             -> blk.17.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.17.input_layernorm.weight           -> blk.17.attn_norm.weight                  | F16    | [3072]
model.layers.17.post_attention_layernorm.weight  -> blk.17.ffn_norm.weight                   | F16    | [3072]
model.layers.18.self_attn.q_proj.weight          -> blk.18.attn_q.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.k_proj.weight          -> blk.18.attn_k.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.v_proj.weight          -> blk.18.attn_v.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.o_proj.weight          -> blk.18.attn_output.weight                | F16    | [3072, 3072]
model.layers.18.mlp.gate_proj.weight             -> blk.18.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.18.mlp.up_proj.weight               -> blk.18.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.18.mlp.down_proj.weight             -> blk.18.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.18.input_layernorm.weight           -> blk.18.attn_norm.weight                  | F16    | [3072]
model.layers.18.post_attention_layernorm.weight  -> blk.18.ffn_norm.weight                   | F16    | [3072]
model.layers.19.self_attn.q_proj.weight          -> blk.19.attn_q.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.k_proj.weight          -> blk.19.attn_k.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.v_proj.weight          -> blk.19.attn_v.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.o_proj.weight          -> blk.19.attn_output.weight                | F16    | [3072, 3072]
model.layers.19.mlp.gate_proj.weight             -> blk.19.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.19.mlp.up_proj.weight               -> blk.19.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.19.mlp.down_proj.weight             -> blk.19.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.19.input_layernorm.weight           -> blk.19.attn_norm.weight                  | F16    | [3072]
model.layers.19.post_attention_layernorm.weight  -> blk.19.ffn_norm.weight                   | F16    | [3072]
model.layers.20.self_attn.q_proj.weight          -> blk.20.attn_q.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.k_proj.weight          -> blk.20.attn_k.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.v_proj.weight          -> blk.20.attn_v.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.o_proj.weight          -> blk.20.attn_output.weight                | F16    | [3072, 3072]
model.layers.20.mlp.gate_proj.weight             -> blk.20.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.20.mlp.up_proj.weight               -> blk.20.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.20.mlp.down_proj.weight             -> blk.20.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.20.input_layernorm.weight           -> blk.20.attn_norm.weight                  | F16    | [3072]
model.layers.20.post_attention_layernorm.weight  -> blk.20.ffn_norm.weight                   | F16    | [3072]
model.layers.21.self_attn.q_proj.weight          -> blk.21.attn_q.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.k_proj.weight          -> blk.21.attn_k.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.v_proj.weight          -> blk.21.attn_v.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.o_proj.weight          -> blk.21.attn_output.weight                | F16    | [3072, 3072]
model.layers.21.mlp.gate_proj.weight             -> blk.21.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.21.mlp.up_proj.weight               -> blk.21.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.21.mlp.down_proj.weight             -> blk.21.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.21.input_layernorm.weight           -> blk.21.attn_norm.weight                  | F16    | [3072]
model.layers.21.post_attention_layernorm.weight  -> blk.21.ffn_norm.weight                   | F16    | [3072]
model.layers.22.self_attn.q_proj.weight          -> blk.22.attn_q.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.k_proj.weight          -> blk.22.attn_k.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.v_proj.weight          -> blk.22.attn_v.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.o_proj.weight          -> blk.22.attn_output.weight                | F16    | [3072, 3072]
model.layers.22.mlp.gate_proj.weight             -> blk.22.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.22.mlp.up_proj.weight               -> blk.22.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.22.mlp.down_proj.weight             -> blk.22.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.22.input_layernorm.weight           -> blk.22.attn_norm.weight                  | F16    | [3072]
model.layers.22.post_attention_layernorm.weight  -> blk.22.ffn_norm.weight                   | F16    | [3072]
model.layers.23.self_attn.q_proj.weight          -> blk.23.attn_q.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.k_proj.weight          -> blk.23.attn_k.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.v_proj.weight          -> blk.23.attn_v.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.o_proj.weight          -> blk.23.attn_output.weight                | F16    | [3072, 3072]
model.layers.23.mlp.gate_proj.weight             -> blk.23.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.23.mlp.up_proj.weight               -> blk.23.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.23.mlp.down_proj.weight             -> blk.23.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.23.input_layernorm.weight           -> blk.23.attn_norm.weight                  | F16    | [3072]
model.layers.23.post_attention_layernorm.weight  -> blk.23.ffn_norm.weight                   | F16    | [3072]
model.layers.24.self_attn.q_proj.weight          -> blk.24.attn_q.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.k_proj.weight          -> blk.24.attn_k.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.v_proj.weight          -> blk.24.attn_v.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.o_proj.weight          -> blk.24.attn_output.weight                | F16    | [3072, 3072]
model.layers.24.mlp.gate_proj.weight             -> blk.24.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.24.mlp.up_proj.weight               -> blk.24.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.24.mlp.down_proj.weight             -> blk.24.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.24.input_layernorm.weight           -> blk.24.attn_norm.weight                  | F16    | [3072]
model.layers.24.post_attention_layernorm.weight  -> blk.24.ffn_norm.weight                   | F16    | [3072]
model.layers.25.self_attn.q_proj.weight          -> blk.25.attn_q.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.k_proj.weight          -> blk.25.attn_k.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.v_proj.weight          -> blk.25.attn_v.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.o_proj.weight          -> blk.25.attn_output.weight                | F16    | [3072, 3072]
model.layers.25.mlp.gate_proj.weight             -> blk.25.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.25.mlp.up_proj.weight               -> blk.25.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.25.mlp.down_proj.weight             -> blk.25.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.25.input_layernorm.weight           -> blk.25.attn_norm.weight                  | F16    | [3072]
model.layers.25.post_attention_layernorm.weight  -> blk.25.ffn_norm.weight                   | F16    | [3072]
model.layers.26.self_attn.q_proj.weight          -> blk.26.attn_q.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.k_proj.weight          -> blk.26.attn_k.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.v_proj.weight          -> blk.26.attn_v.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.o_proj.weight          -> blk.26.attn_output.weight                | F16    | [3072, 3072]
model.layers.26.mlp.gate_proj.weight             -> blk.26.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.26.mlp.up_proj.weight               -> blk.26.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.26.mlp.down_proj.weight             -> blk.26.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.26.input_layernorm.weight           -> blk.26.attn_norm.weight                  | F16    | [3072]
model.layers.26.post_attention_layernorm.weight  -> blk.26.ffn_norm.weight                   | F16    | [3072]
model.layers.27.self_attn.q_proj.weight          -> blk.27.attn_q.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.k_proj.weight          -> blk.27.attn_k.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.v_proj.weight          -> blk.27.attn_v.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.o_proj.weight          -> blk.27.attn_output.weight                | F16    | [3072, 3072]
model.layers.27.mlp.gate_proj.weight             -> blk.27.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.27.mlp.up_proj.weight               -> blk.27.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.27.mlp.down_proj.weight             -> blk.27.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.27.input_layernorm.weight           -> blk.27.attn_norm.weight                  | F16    | [3072]
model.layers.27.post_attention_layernorm.weight  -> blk.27.ffn_norm.weight                   | F16    | [3072]
model.layers.28.self_attn.q_proj.weight          -> blk.28.attn_q.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.k_proj.weight          -> blk.28.attn_k.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.v_proj.weight          -> blk.28.attn_v.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.o_proj.weight          -> blk.28.attn_output.weight                | F16    | [3072, 3072]
model.layers.28.mlp.gate_proj.weight             -> blk.28.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.28.mlp.up_proj.weight               -> blk.28.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.28.mlp.down_proj.weight             -> blk.28.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.28.input_layernorm.weight           -> blk.28.attn_norm.weight                  | F16    | [3072]
model.layers.28.post_attention_layernorm.weight  -> blk.28.ffn_norm.weight                   | F16    | [3072]
model.layers.29.self_attn.q_proj.weight          -> blk.29.attn_q.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.k_proj.weight          -> blk.29.attn_k.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.v_proj.weight          -> blk.29.attn_v.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.o_proj.weight          -> blk.29.attn_output.weight                | F16    | [3072, 3072]
model.layers.29.mlp.gate_proj.weight             -> blk.29.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.29.mlp.up_proj.weight               -> blk.29.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.29.mlp.down_proj.weight             -> blk.29.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.29.input_layernorm.weight           -> blk.29.attn_norm.weight                  | F16    | [3072]
model.layers.29.post_attention_layernorm.weight  -> blk.29.ffn_norm.weight                   | F16    | [3072]
model.layers.30.self_attn.q_proj.weight          -> blk.30.attn_q.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.k_proj.weight          -> blk.30.attn_k.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.v_proj.weight          -> blk.30.attn_v.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.o_proj.weight          -> blk.30.attn_output.weight                | F16    | [3072, 3072]
model.layers.30.mlp.gate_proj.weight             -> blk.30.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.30.mlp.up_proj.weight               -> blk.30.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.30.mlp.down_proj.weight             -> blk.30.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.30.input_layernorm.weight           -> blk.30.attn_norm.weight                  | F16    | [3072]
model.layers.30.post_attention_layernorm.weight  -> blk.30.ffn_norm.weight                   | F16    | [3072]
model.layers.31.self_attn.q_proj.weight          -> blk.31.attn_q.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.k_proj.weight          -> blk.31.attn_k.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.v_proj.weight          -> blk.31.attn_v.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.o_proj.weight          -> blk.31.attn_output.weight                | F16    | [3072, 3072]
model.layers.31.mlp.gate_proj.weight             -> blk.31.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.31.mlp.up_proj.weight               -> blk.31.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.31.mlp.down_proj.weight             -> blk.31.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.31.input_layernorm.weight           -> blk.31.attn_norm.weight                  | F16    | [3072]
model.layers.31.post_attention_layernorm.weight  -> blk.31.ffn_norm.weight                   | F16    | [3072]
model.norm.weight                                -> output_norm.weight                       | F16    | [3072]
lm_head.weight                                   -> output.weight                            | F16    | [32064, 3072]
Writing model-unsloth.F16.gguf, format 1
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-16-2e4f2e9c1ca2>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 # Save to q4_k_m GGUF
---> 10 if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
     11 if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

1 frames
[/usr/local/lib/python3.10/dist-packages/unsloth/save.py](https://localhost:8080/#) in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
    962             )
    963         else:
--> 964             raise RuntimeError(
    965                 f"Unsloth: Quantization failed for {final_location}\n"\
    966                 "You might have to compile llama.cpp yourself, then run this again.\n"\

RuntimeError: Unsloth: Quantization failed for ./model-unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
@danielhanchen danielhanchen added the currently fixing Am fixing now! label May 4, 2024
@danielhanchen
Copy link
Contributor

Working on a fix! Sorry on the issue again!

@DrewThomasson
Copy link

I also await this fix!

@win4r
Copy link

win4r commented May 12, 2024

did you fixed it?

@Li-Yanzhi
Copy link
Author

Still waiting...

@Li-Yanzhi
Copy link
Author

I have manually save and quantize the gguf model as steps bellow (in Windows 11 environment):

  1. In notebook which unsloth shared in colab, merge the loral model in the model directory and save it as a 16-bit HF model after training, which will generate files like model-00001-00002.safetensors, model-00002-00002.safetensors, etc.
model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")
  1. In llama.cpp directory (cd llama.cpp), execute the following command line to merge the safetensors files into the gguf format and save them in the output directory (replace XX with the actual directory).
python convert.py XX\model --outfile XX\output\my-phi-3.gguf --pad-vocab --outtype f16
  1. In llama.cpp directory, execute the following command to quantize my-phi-3.gguf into the q4_k_m format.
quantize.exe XX\output\my-phi-3.gguf XX\output\my-phi-3-q4_k_m.gguf --outtype q4_k_m

I followed the format of the datasets referenced in the notebook to generate nearly 300 training data entries from a user manual of internal application. After that, I performed fine-tuning. However, the fine-tuned LoRA model and the quantized model both seem unable to correctly answer the same questions from the dataset. I am still unsure which step might be causing the issue.

@danielhanchen
Copy link
Contributor

Apologies everyone! @Li-Yanzhi @win4r @DrewThomasson Whoops I forgot to inform you all that it should be fixed!!! (I actually pushed a fix a few days ago whoops!)

Please update Unsloth for local installations:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

For Colab and Kaggle, no need, just restart the kernel.

Apologies on the delay - hope it works now!

@danielhanchen danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels May 15, 2024
@eugeniosegala
Copy link

eugeniosegala commented May 16, 2024

@danielhanchen I think there is something strange which is happening.

It looks like llama.cpp is able to start the quantisation process now, however there is a large amount of logs which is being generated by it, resulting in freezing Google Colab.

Did anyone else experience this?

These are some of the logs which are generated:

Screenshot 2024-05-16 at 12 54 51

I have performed my test on the latest version on Unsloth on a fresh Google Colab instance, using 8bit Q8_0 and q4_k_m quantisation.

@eugeniosegala
Copy link

Probably related to #476

@Li-Yanzhi
Copy link
Author

Thanks @danielhanchen, I can run Phi-3 notebook successfully in Colab now.

BTW: When I run same code on my Windows 11, there are some filename issues (e.g. quantize vs quantize.exe) in save.py, when I manually edit these, I can also run the code on my Windows Laptop too. Only one question left is that it seems my own training dataset is not well trained in LORA model and model cannot answer the question in dataset correctly, I will try to figure this out ...

@eugeniosegala
Copy link

eugeniosegala commented May 17, 2024

I can confirm that GGUF quantisation works now! thanks! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster
Projects
None yet
Development

No branches or pull requests

5 participants