Quantization (q4_k_m gguf) failed for Phi-3 #413

Li-Yanzhi · 2024-05-02T16:47:23Z

When run Alpaca + Phi-3 3.8b full example.ipynb in https://colab.research.google.com/drive/1NvkBmkHfucGO3Ve9s1NKZvMNlw5p83ym?usp=sharing, in last step to save quantization model:

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

the error will occur:

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 5.96 out of 12.67 RAM for saving.
100%|██████████| 32/32 [00:01<00:00, 19.69it/s]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Unsloth: Saving model/pytorch_model-00001-of-00002.bin...
Unsloth: Saving model/pytorch_model-00002-of-00002.bin...
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp will take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GUUF 16bits will take 3 minutes.
\        /    [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
 "-____-"     In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at model into f16 GGUF format.
The output location will be ./model-unsloth.F16.gguf
This will take 3 minutes...
Loading model file model/pytorch_model-00001-of-00002.bin
Loading model file model/pytorch_model-00001-of-00002.bin
Loading model file model/pytorch_model-00002-of-00002.bin
params = Params(n_vocab=32064, n_embd=3072, n_layer=32, n_ctx=4096, n_ff=8192, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('model'))
Loaded vocab file PosixPath('model/tokenizer.model'), type 'spm'
Vocab info: <SentencePieceVocab with 32000 base tokens and 11 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 32000, 'unk': 0, 'pad': 32000}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight                        -> token_embd.weight                        | F16    | [32064, 3072]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | F16    | [3072, 3072]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | F16    | [3072, 3072]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | F16    | [3072]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | F16    | [3072]
model.layers.1.self_attn.q_proj.weight           -> blk.1.attn_q.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.k_proj.weight           -> blk.1.attn_k.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.v_proj.weight           -> blk.1.attn_v.weight                      | F16    | [3072, 3072]
model.layers.1.self_attn.o_proj.weight           -> blk.1.attn_output.weight                 | F16    | [3072, 3072]
model.layers.1.mlp.gate_proj.weight              -> blk.1.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.1.mlp.up_proj.weight                -> blk.1.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.1.mlp.down_proj.weight              -> blk.1.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.1.input_layernorm.weight            -> blk.1.attn_norm.weight                   | F16    | [3072]
model.layers.1.post_attention_layernorm.weight   -> blk.1.ffn_norm.weight                    | F16    | [3072]
model.layers.2.self_attn.q_proj.weight           -> blk.2.attn_q.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.k_proj.weight           -> blk.2.attn_k.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.v_proj.weight           -> blk.2.attn_v.weight                      | F16    | [3072, 3072]
model.layers.2.self_attn.o_proj.weight           -> blk.2.attn_output.weight                 | F16    | [3072, 3072]
model.layers.2.mlp.gate_proj.weight              -> blk.2.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.2.mlp.up_proj.weight                -> blk.2.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.2.mlp.down_proj.weight              -> blk.2.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.2.input_layernorm.weight            -> blk.2.attn_norm.weight                   | F16    | [3072]
model.layers.2.post_attention_layernorm.weight   -> blk.2.ffn_norm.weight                    | F16    | [3072]
model.layers.3.self_attn.q_proj.weight           -> blk.3.attn_q.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.k_proj.weight           -> blk.3.attn_k.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.v_proj.weight           -> blk.3.attn_v.weight                      | F16    | [3072, 3072]
model.layers.3.self_attn.o_proj.weight           -> blk.3.attn_output.weight                 | F16    | [3072, 3072]
model.layers.3.mlp.gate_proj.weight              -> blk.3.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.3.mlp.up_proj.weight                -> blk.3.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.3.mlp.down_proj.weight              -> blk.3.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.3.input_layernorm.weight            -> blk.3.attn_norm.weight                   | F16    | [3072]
model.layers.3.post_attention_layernorm.weight   -> blk.3.ffn_norm.weight                    | F16    | [3072]
model.layers.4.self_attn.q_proj.weight           -> blk.4.attn_q.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.k_proj.weight           -> blk.4.attn_k.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.v_proj.weight           -> blk.4.attn_v.weight                      | F16    | [3072, 3072]
model.layers.4.self_attn.o_proj.weight           -> blk.4.attn_output.weight                 | F16    | [3072, 3072]
model.layers.4.mlp.gate_proj.weight              -> blk.4.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.4.mlp.up_proj.weight                -> blk.4.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.4.mlp.down_proj.weight              -> blk.4.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.4.input_layernorm.weight            -> blk.4.attn_norm.weight                   | F16    | [3072]
model.layers.4.post_attention_layernorm.weight   -> blk.4.ffn_norm.weight                    | F16    | [3072]
model.layers.5.self_attn.q_proj.weight           -> blk.5.attn_q.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.k_proj.weight           -> blk.5.attn_k.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.v_proj.weight           -> blk.5.attn_v.weight                      | F16    | [3072, 3072]
model.layers.5.self_attn.o_proj.weight           -> blk.5.attn_output.weight                 | F16    | [3072, 3072]
model.layers.5.mlp.gate_proj.weight              -> blk.5.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.5.mlp.up_proj.weight                -> blk.5.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.5.mlp.down_proj.weight              -> blk.5.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.5.input_layernorm.weight            -> blk.5.attn_norm.weight                   | F16    | [3072]
model.layers.5.post_attention_layernorm.weight   -> blk.5.ffn_norm.weight                    | F16    | [3072]
model.layers.6.self_attn.q_proj.weight           -> blk.6.attn_q.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.k_proj.weight           -> blk.6.attn_k.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.v_proj.weight           -> blk.6.attn_v.weight                      | F16    | [3072, 3072]
model.layers.6.self_attn.o_proj.weight           -> blk.6.attn_output.weight                 | F16    | [3072, 3072]
model.layers.6.mlp.gate_proj.weight              -> blk.6.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.6.mlp.up_proj.weight                -> blk.6.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.6.mlp.down_proj.weight              -> blk.6.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.6.input_layernorm.weight            -> blk.6.attn_norm.weight                   | F16    | [3072]
model.layers.6.post_attention_layernorm.weight   -> blk.6.ffn_norm.weight                    | F16    | [3072]
model.layers.7.self_attn.q_proj.weight           -> blk.7.attn_q.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.k_proj.weight           -> blk.7.attn_k.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.v_proj.weight           -> blk.7.attn_v.weight                      | F16    | [3072, 3072]
model.layers.7.self_attn.o_proj.weight           -> blk.7.attn_output.weight                 | F16    | [3072, 3072]
model.layers.7.mlp.gate_proj.weight              -> blk.7.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.7.mlp.up_proj.weight                -> blk.7.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.7.mlp.down_proj.weight              -> blk.7.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.7.input_layernorm.weight            -> blk.7.attn_norm.weight                   | F16    | [3072]
model.layers.7.post_attention_layernorm.weight   -> blk.7.ffn_norm.weight                    | F16    | [3072]
model.layers.8.self_attn.q_proj.weight           -> blk.8.attn_q.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.k_proj.weight           -> blk.8.attn_k.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.v_proj.weight           -> blk.8.attn_v.weight                      | F16    | [3072, 3072]
model.layers.8.self_attn.o_proj.weight           -> blk.8.attn_output.weight                 | F16    | [3072, 3072]
model.layers.8.mlp.gate_proj.weight              -> blk.8.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.8.mlp.up_proj.weight                -> blk.8.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.8.mlp.down_proj.weight              -> blk.8.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.8.input_layernorm.weight            -> blk.8.attn_norm.weight                   | F16    | [3072]
model.layers.8.post_attention_layernorm.weight   -> blk.8.ffn_norm.weight                    | F16    | [3072]
model.layers.9.self_attn.q_proj.weight           -> blk.9.attn_q.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.k_proj.weight           -> blk.9.attn_k.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.v_proj.weight           -> blk.9.attn_v.weight                      | F16    | [3072, 3072]
model.layers.9.self_attn.o_proj.weight           -> blk.9.attn_output.weight                 | F16    | [3072, 3072]
model.layers.9.mlp.gate_proj.weight              -> blk.9.ffn_gate.weight                    | F16    | [8192, 3072]
model.layers.9.mlp.up_proj.weight                -> blk.9.ffn_up.weight                      | F16    | [8192, 3072]
model.layers.9.mlp.down_proj.weight              -> blk.9.ffn_down.weight                    | F16    | [3072, 8192]
model.layers.9.input_layernorm.weight            -> blk.9.attn_norm.weight                   | F16    | [3072]
model.layers.9.post_attention_layernorm.weight   -> blk.9.ffn_norm.weight                    | F16    | [3072]
model.layers.10.self_attn.q_proj.weight          -> blk.10.attn_q.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.k_proj.weight          -> blk.10.attn_k.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.v_proj.weight          -> blk.10.attn_v.weight                     | F16    | [3072, 3072]
model.layers.10.self_attn.o_proj.weight          -> blk.10.attn_output.weight                | F16    | [3072, 3072]
model.layers.10.mlp.gate_proj.weight             -> blk.10.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.10.mlp.up_proj.weight               -> blk.10.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.10.mlp.down_proj.weight             -> blk.10.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.10.input_layernorm.weight           -> blk.10.attn_norm.weight                  | F16    | [3072]
model.layers.10.post_attention_layernorm.weight  -> blk.10.ffn_norm.weight                   | F16    | [3072]
model.layers.11.self_attn.q_proj.weight          -> blk.11.attn_q.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.k_proj.weight          -> blk.11.attn_k.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.v_proj.weight          -> blk.11.attn_v.weight                     | F16    | [3072, 3072]
model.layers.11.self_attn.o_proj.weight          -> blk.11.attn_output.weight                | F16    | [3072, 3072]
model.layers.11.mlp.gate_proj.weight             -> blk.11.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.11.mlp.up_proj.weight               -> blk.11.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.11.mlp.down_proj.weight             -> blk.11.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.11.input_layernorm.weight           -> blk.11.attn_norm.weight                  | F16    | [3072]
model.layers.11.post_attention_layernorm.weight  -> blk.11.ffn_norm.weight                   | F16    | [3072]
model.layers.12.self_attn.q_proj.weight          -> blk.12.attn_q.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.k_proj.weight          -> blk.12.attn_k.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.v_proj.weight          -> blk.12.attn_v.weight                     | F16    | [3072, 3072]
model.layers.12.self_attn.o_proj.weight          -> blk.12.attn_output.weight                | F16    | [3072, 3072]
model.layers.12.mlp.gate_proj.weight             -> blk.12.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.12.mlp.up_proj.weight               -> blk.12.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.12.mlp.down_proj.weight             -> blk.12.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.12.input_layernorm.weight           -> blk.12.attn_norm.weight                  | F16    | [3072]
model.layers.12.post_attention_layernorm.weight  -> blk.12.ffn_norm.weight                   | F16    | [3072]
model.layers.13.self_attn.q_proj.weight          -> blk.13.attn_q.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.k_proj.weight          -> blk.13.attn_k.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.v_proj.weight          -> blk.13.attn_v.weight                     | F16    | [3072, 3072]
model.layers.13.self_attn.o_proj.weight          -> blk.13.attn_output.weight                | F16    | [3072, 3072]
model.layers.13.mlp.gate_proj.weight             -> blk.13.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.13.mlp.up_proj.weight               -> blk.13.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.13.mlp.down_proj.weight             -> blk.13.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.13.input_layernorm.weight           -> blk.13.attn_norm.weight                  | F16    | [3072]
model.layers.13.post_attention_layernorm.weight  -> blk.13.ffn_norm.weight                   | F16    | [3072]
model.layers.14.self_attn.q_proj.weight          -> blk.14.attn_q.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.k_proj.weight          -> blk.14.attn_k.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.v_proj.weight          -> blk.14.attn_v.weight                     | F16    | [3072, 3072]
model.layers.14.self_attn.o_proj.weight          -> blk.14.attn_output.weight                | F16    | [3072, 3072]
model.layers.14.mlp.gate_proj.weight             -> blk.14.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.14.mlp.up_proj.weight               -> blk.14.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.14.mlp.down_proj.weight             -> blk.14.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.14.input_layernorm.weight           -> blk.14.attn_norm.weight                  | F16    | [3072]
model.layers.14.post_attention_layernorm.weight  -> blk.14.ffn_norm.weight                   | F16    | [3072]
model.layers.15.self_attn.q_proj.weight          -> blk.15.attn_q.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.k_proj.weight          -> blk.15.attn_k.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.v_proj.weight          -> blk.15.attn_v.weight                     | F16    | [3072, 3072]
model.layers.15.self_attn.o_proj.weight          -> blk.15.attn_output.weight                | F16    | [3072, 3072]
model.layers.15.mlp.gate_proj.weight             -> blk.15.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.15.mlp.up_proj.weight               -> blk.15.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.15.mlp.down_proj.weight             -> blk.15.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.15.input_layernorm.weight           -> blk.15.attn_norm.weight                  | F16    | [3072]
model.layers.15.post_attention_layernorm.weight  -> blk.15.ffn_norm.weight                   | F16    | [3072]
model.layers.16.self_attn.q_proj.weight          -> blk.16.attn_q.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.k_proj.weight          -> blk.16.attn_k.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.v_proj.weight          -> blk.16.attn_v.weight                     | F16    | [3072, 3072]
model.layers.16.self_attn.o_proj.weight          -> blk.16.attn_output.weight                | F16    | [3072, 3072]
model.layers.16.mlp.gate_proj.weight             -> blk.16.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.16.mlp.up_proj.weight               -> blk.16.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.16.mlp.down_proj.weight             -> blk.16.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.16.input_layernorm.weight           -> blk.16.attn_norm.weight                  | F16    | [3072]
model.layers.16.post_attention_layernorm.weight  -> blk.16.ffn_norm.weight                   | F16    | [3072]
model.layers.17.self_attn.q_proj.weight          -> blk.17.attn_q.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.k_proj.weight          -> blk.17.attn_k.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.v_proj.weight          -> blk.17.attn_v.weight                     | F16    | [3072, 3072]
model.layers.17.self_attn.o_proj.weight          -> blk.17.attn_output.weight                | F16    | [3072, 3072]
model.layers.17.mlp.gate_proj.weight             -> blk.17.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.17.mlp.up_proj.weight               -> blk.17.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.17.mlp.down_proj.weight             -> blk.17.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.17.input_layernorm.weight           -> blk.17.attn_norm.weight                  | F16    | [3072]
model.layers.17.post_attention_layernorm.weight  -> blk.17.ffn_norm.weight                   | F16    | [3072]
model.layers.18.self_attn.q_proj.weight          -> blk.18.attn_q.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.k_proj.weight          -> blk.18.attn_k.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.v_proj.weight          -> blk.18.attn_v.weight                     | F16    | [3072, 3072]
model.layers.18.self_attn.o_proj.weight          -> blk.18.attn_output.weight                | F16    | [3072, 3072]
model.layers.18.mlp.gate_proj.weight             -> blk.18.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.18.mlp.up_proj.weight               -> blk.18.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.18.mlp.down_proj.weight             -> blk.18.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.18.input_layernorm.weight           -> blk.18.attn_norm.weight                  | F16    | [3072]
model.layers.18.post_attention_layernorm.weight  -> blk.18.ffn_norm.weight                   | F16    | [3072]
model.layers.19.self_attn.q_proj.weight          -> blk.19.attn_q.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.k_proj.weight          -> blk.19.attn_k.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.v_proj.weight          -> blk.19.attn_v.weight                     | F16    | [3072, 3072]
model.layers.19.self_attn.o_proj.weight          -> blk.19.attn_output.weight                | F16    | [3072, 3072]
model.layers.19.mlp.gate_proj.weight             -> blk.19.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.19.mlp.up_proj.weight               -> blk.19.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.19.mlp.down_proj.weight             -> blk.19.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.19.input_layernorm.weight           -> blk.19.attn_norm.weight                  | F16    | [3072]
model.layers.19.post_attention_layernorm.weight  -> blk.19.ffn_norm.weight                   | F16    | [3072]
model.layers.20.self_attn.q_proj.weight          -> blk.20.attn_q.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.k_proj.weight          -> blk.20.attn_k.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.v_proj.weight          -> blk.20.attn_v.weight                     | F16    | [3072, 3072]
model.layers.20.self_attn.o_proj.weight          -> blk.20.attn_output.weight                | F16    | [3072, 3072]
model.layers.20.mlp.gate_proj.weight             -> blk.20.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.20.mlp.up_proj.weight               -> blk.20.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.20.mlp.down_proj.weight             -> blk.20.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.20.input_layernorm.weight           -> blk.20.attn_norm.weight                  | F16    | [3072]
model.layers.20.post_attention_layernorm.weight  -> blk.20.ffn_norm.weight                   | F16    | [3072]
model.layers.21.self_attn.q_proj.weight          -> blk.21.attn_q.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.k_proj.weight          -> blk.21.attn_k.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.v_proj.weight          -> blk.21.attn_v.weight                     | F16    | [3072, 3072]
model.layers.21.self_attn.o_proj.weight          -> blk.21.attn_output.weight                | F16    | [3072, 3072]
model.layers.21.mlp.gate_proj.weight             -> blk.21.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.21.mlp.up_proj.weight               -> blk.21.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.21.mlp.down_proj.weight             -> blk.21.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.21.input_layernorm.weight           -> blk.21.attn_norm.weight                  | F16    | [3072]
model.layers.21.post_attention_layernorm.weight  -> blk.21.ffn_norm.weight                   | F16    | [3072]
model.layers.22.self_attn.q_proj.weight          -> blk.22.attn_q.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.k_proj.weight          -> blk.22.attn_k.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.v_proj.weight          -> blk.22.attn_v.weight                     | F16    | [3072, 3072]
model.layers.22.self_attn.o_proj.weight          -> blk.22.attn_output.weight                | F16    | [3072, 3072]
model.layers.22.mlp.gate_proj.weight             -> blk.22.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.22.mlp.up_proj.weight               -> blk.22.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.22.mlp.down_proj.weight             -> blk.22.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.22.input_layernorm.weight           -> blk.22.attn_norm.weight                  | F16    | [3072]
model.layers.22.post_attention_layernorm.weight  -> blk.22.ffn_norm.weight                   | F16    | [3072]
model.layers.23.self_attn.q_proj.weight          -> blk.23.attn_q.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.k_proj.weight          -> blk.23.attn_k.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.v_proj.weight          -> blk.23.attn_v.weight                     | F16    | [3072, 3072]
model.layers.23.self_attn.o_proj.weight          -> blk.23.attn_output.weight                | F16    | [3072, 3072]
model.layers.23.mlp.gate_proj.weight             -> blk.23.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.23.mlp.up_proj.weight               -> blk.23.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.23.mlp.down_proj.weight             -> blk.23.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.23.input_layernorm.weight           -> blk.23.attn_norm.weight                  | F16    | [3072]
model.layers.23.post_attention_layernorm.weight  -> blk.23.ffn_norm.weight                   | F16    | [3072]
model.layers.24.self_attn.q_proj.weight          -> blk.24.attn_q.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.k_proj.weight          -> blk.24.attn_k.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.v_proj.weight          -> blk.24.attn_v.weight                     | F16    | [3072, 3072]
model.layers.24.self_attn.o_proj.weight          -> blk.24.attn_output.weight                | F16    | [3072, 3072]
model.layers.24.mlp.gate_proj.weight             -> blk.24.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.24.mlp.up_proj.weight               -> blk.24.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.24.mlp.down_proj.weight             -> blk.24.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.24.input_layernorm.weight           -> blk.24.attn_norm.weight                  | F16    | [3072]
model.layers.24.post_attention_layernorm.weight  -> blk.24.ffn_norm.weight                   | F16    | [3072]
model.layers.25.self_attn.q_proj.weight          -> blk.25.attn_q.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.k_proj.weight          -> blk.25.attn_k.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.v_proj.weight          -> blk.25.attn_v.weight                     | F16    | [3072, 3072]
model.layers.25.self_attn.o_proj.weight          -> blk.25.attn_output.weight                | F16    | [3072, 3072]
model.layers.25.mlp.gate_proj.weight             -> blk.25.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.25.mlp.up_proj.weight               -> blk.25.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.25.mlp.down_proj.weight             -> blk.25.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.25.input_layernorm.weight           -> blk.25.attn_norm.weight                  | F16    | [3072]
model.layers.25.post_attention_layernorm.weight  -> blk.25.ffn_norm.weight                   | F16    | [3072]
model.layers.26.self_attn.q_proj.weight          -> blk.26.attn_q.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.k_proj.weight          -> blk.26.attn_k.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.v_proj.weight          -> blk.26.attn_v.weight                     | F16    | [3072, 3072]
model.layers.26.self_attn.o_proj.weight          -> blk.26.attn_output.weight                | F16    | [3072, 3072]
model.layers.26.mlp.gate_proj.weight             -> blk.26.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.26.mlp.up_proj.weight               -> blk.26.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.26.mlp.down_proj.weight             -> blk.26.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.26.input_layernorm.weight           -> blk.26.attn_norm.weight                  | F16    | [3072]
model.layers.26.post_attention_layernorm.weight  -> blk.26.ffn_norm.weight                   | F16    | [3072]
model.layers.27.self_attn.q_proj.weight          -> blk.27.attn_q.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.k_proj.weight          -> blk.27.attn_k.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.v_proj.weight          -> blk.27.attn_v.weight                     | F16    | [3072, 3072]
model.layers.27.self_attn.o_proj.weight          -> blk.27.attn_output.weight                | F16    | [3072, 3072]
model.layers.27.mlp.gate_proj.weight             -> blk.27.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.27.mlp.up_proj.weight               -> blk.27.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.27.mlp.down_proj.weight             -> blk.27.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.27.input_layernorm.weight           -> blk.27.attn_norm.weight                  | F16    | [3072]
model.layers.27.post_attention_layernorm.weight  -> blk.27.ffn_norm.weight                   | F16    | [3072]
model.layers.28.self_attn.q_proj.weight          -> blk.28.attn_q.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.k_proj.weight          -> blk.28.attn_k.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.v_proj.weight          -> blk.28.attn_v.weight                     | F16    | [3072, 3072]
model.layers.28.self_attn.o_proj.weight          -> blk.28.attn_output.weight                | F16    | [3072, 3072]
model.layers.28.mlp.gate_proj.weight             -> blk.28.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.28.mlp.up_proj.weight               -> blk.28.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.28.mlp.down_proj.weight             -> blk.28.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.28.input_layernorm.weight           -> blk.28.attn_norm.weight                  | F16    | [3072]
model.layers.28.post_attention_layernorm.weight  -> blk.28.ffn_norm.weight                   | F16    | [3072]
model.layers.29.self_attn.q_proj.weight          -> blk.29.attn_q.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.k_proj.weight          -> blk.29.attn_k.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.v_proj.weight          -> blk.29.attn_v.weight                     | F16    | [3072, 3072]
model.layers.29.self_attn.o_proj.weight          -> blk.29.attn_output.weight                | F16    | [3072, 3072]
model.layers.29.mlp.gate_proj.weight             -> blk.29.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.29.mlp.up_proj.weight               -> blk.29.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.29.mlp.down_proj.weight             -> blk.29.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.29.input_layernorm.weight           -> blk.29.attn_norm.weight                  | F16    | [3072]
model.layers.29.post_attention_layernorm.weight  -> blk.29.ffn_norm.weight                   | F16    | [3072]
model.layers.30.self_attn.q_proj.weight          -> blk.30.attn_q.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.k_proj.weight          -> blk.30.attn_k.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.v_proj.weight          -> blk.30.attn_v.weight                     | F16    | [3072, 3072]
model.layers.30.self_attn.o_proj.weight          -> blk.30.attn_output.weight                | F16    | [3072, 3072]
model.layers.30.mlp.gate_proj.weight             -> blk.30.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.30.mlp.up_proj.weight               -> blk.30.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.30.mlp.down_proj.weight             -> blk.30.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.30.input_layernorm.weight           -> blk.30.attn_norm.weight                  | F16    | [3072]
model.layers.30.post_attention_layernorm.weight  -> blk.30.ffn_norm.weight                   | F16    | [3072]
model.layers.31.self_attn.q_proj.weight          -> blk.31.attn_q.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.k_proj.weight          -> blk.31.attn_k.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.v_proj.weight          -> blk.31.attn_v.weight                     | F16    | [3072, 3072]
model.layers.31.self_attn.o_proj.weight          -> blk.31.attn_output.weight                | F16    | [3072, 3072]
model.layers.31.mlp.gate_proj.weight             -> blk.31.ffn_gate.weight                   | F16    | [8192, 3072]
model.layers.31.mlp.up_proj.weight               -> blk.31.ffn_up.weight                     | F16    | [8192, 3072]
model.layers.31.mlp.down_proj.weight             -> blk.31.ffn_down.weight                   | F16    | [3072, 8192]
model.layers.31.input_layernorm.weight           -> blk.31.attn_norm.weight                  | F16    | [3072]
model.layers.31.post_attention_layernorm.weight  -> blk.31.ffn_norm.weight                   | F16    | [3072]
model.norm.weight                                -> output_norm.weight                       | F16    | [3072]
lm_head.weight                                   -> output.weight                            | F16    | [32064, 3072]
Writing model-unsloth.F16.gguf, format 1
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-16-2e4f2e9c1ca2>](https://localhost:8080/#) in <cell line: 10>()
      8 
      9 # Save to q4_k_m GGUF
---> 10 if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
     11 if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

1 frames
[/usr/local/lib/python3.10/dist-packages/unsloth/save.py](https://localhost:8080/#) in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
    962             )
    963         else:
--> 964             raise RuntimeError(
    965                 f"Unsloth: Quantization failed for {final_location}\n"\
    966                 "You might have to compile llama.cpp yourself, then run this again.\n"\

RuntimeError: Unsloth: Quantization failed for ./model-unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-05-04T09:52:24Z

Working on a fix! Sorry on the issue again!

DrewThomasson · 2024-05-10T22:23:43Z

I also await this fix!

win4r · 2024-05-12T04:24:50Z

did you fixed it?

Li-Yanzhi · 2024-05-13T00:34:29Z

Still waiting...

Li-Yanzhi · 2024-05-15T02:13:59Z

I have manually save and quantize the gguf model as steps bellow (in Windows 11 environment):

In notebook which unsloth shared in colab, merge the loral model in the model directory and save it as a 16-bit HF model after training, which will generate files like model-00001-00002.safetensors, model-00002-00002.safetensors, etc.

model.save_pretrained_merged("model", tokenizer, save_method="merged_16bit")

In llama.cpp directory (cd llama.cpp), execute the following command line to merge the safetensors files into the gguf format and save them in the output directory (replace XX with the actual directory).

python convert.py XX\model --outfile XX\output\my-phi-3.gguf --pad-vocab --outtype f16

In llama.cpp directory, execute the following command to quantize my-phi-3.gguf into the q4_k_m format.

quantize.exe XX\output\my-phi-3.gguf XX\output\my-phi-3-q4_k_m.gguf --outtype q4_k_m

I followed the format of the datasets referenced in the notebook to generate nearly 300 training data entries from a user manual of internal application. After that, I performed fine-tuning. However, the fine-tuned LoRA model and the quantized model both seem unable to correctly answer the same questions from the dataset. I am still unsure which step might be causing the issue.

danielhanchen · 2024-05-15T10:41:53Z

Apologies everyone! @Li-Yanzhi @win4r @DrewThomasson Whoops I forgot to inform you all that it should be fixed!!! (I actually pushed a fix a few days ago whoops!)

Please update Unsloth for local installations:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

For Colab and Kaggle, no need, just restart the kernel.

Apologies on the delay - hope it works now!

eugeniosegala · 2024-05-16T11:54:04Z

@danielhanchen I think there is something strange which is happening.

It looks like llama.cpp is able to start the quantisation process now, however there is a large amount of logs which is being generated by it, resulting in freezing Google Colab.

Did anyone else experience this?

These are some of the logs which are generated:

I have performed my test on the latest version on Unsloth on a fresh Google Colab instance, using 8bit Q8_0 and q4_k_m quantisation.

eugeniosegala · 2024-05-16T13:57:10Z

Probably related to #476

Li-Yanzhi · 2024-05-17T04:39:03Z

Thanks @danielhanchen, I can run Phi-3 notebook successfully in Colab now.

BTW: When I run same code on my Windows 11, there are some filename issues (e.g. quantize vs quantize.exe) in save.py, when I manually edit these, I can also run the code on my Windows Laptop too. Only one question left is that it seems my own training dataset is not well trained in LORA model and model cannot answer the question in dataset correctly, I will try to figure this out ...

eugeniosegala · 2024-05-17T07:00:47Z

I can confirm that GGUF quantisation works now! thanks! 🙏

danielhanchen added the currently fixing Am fixing now! label May 4, 2024

danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization (q4_k_m gguf) failed for Phi-3 #413

Quantization (q4_k_m gguf) failed for Phi-3 #413

Li-Yanzhi commented May 2, 2024

danielhanchen commented May 4, 2024

DrewThomasson commented May 10, 2024

win4r commented May 12, 2024

Li-Yanzhi commented May 13, 2024

Li-Yanzhi commented May 15, 2024

danielhanchen commented May 15, 2024

eugeniosegala commented May 16, 2024 •

edited

eugeniosegala commented May 16, 2024

Li-Yanzhi commented May 17, 2024

eugeniosegala commented May 17, 2024 •

edited

Quantization (q4_k_m gguf) failed for Phi-3 #413

Quantization (q4_k_m gguf) failed for Phi-3 #413

Comments

Li-Yanzhi commented May 2, 2024

danielhanchen commented May 4, 2024

DrewThomasson commented May 10, 2024

win4r commented May 12, 2024

Li-Yanzhi commented May 13, 2024

Li-Yanzhi commented May 15, 2024

danielhanchen commented May 15, 2024

eugeniosegala commented May 16, 2024 • edited

eugeniosegala commented May 16, 2024

Li-Yanzhi commented May 17, 2024

eugeniosegala commented May 17, 2024 • edited

eugeniosegala commented May 16, 2024 •

edited

eugeniosegala commented May 17, 2024 •

edited