Add safe tensor support to convert-llama.py #52

DifferentialityDevelopment · 2024-05-14T12:34:01Z

I haven't yet updated the other model conversion scripts yet, but this allows you to convert any llama model that uses safetensor.

b4rtaz · 2024-05-14T15:43:34Z

Please update also docs/LLAMA.md.

b4rtaz · 2024-05-14T16:13:53Z

converter/convert-llama.py

+    if '/' in modelPath:
+        modelName = modelPath.split('/')[-1]
+    else:
+        modelName = modelPath.split('\\')[-1]


I think the os.path.basename function would be better to extract the filename.

DifferentialityDevelopment · 2024-05-14T17:06:05Z

Please update also docs/LLAMA.md.

I updated the usage a bit, though could probably mention that it would work with the hugging face repo for Llama as well.

b4rtaz · 2024-05-15T07:51:39Z

@DifferentialityDevelopment I'm wondering about this part:

        with safetensors.safe_open(model_file, framework="pt") as f:
            for layer in f.keys():
                layers.append({
                    "name" : layer,
                    "file" : model_file
                })

Are you sure that the source model has all layers in the correct order that is expected by Distributed Llama?

DifferentialityDevelopment · 2024-05-15T08:22:04Z

@DifferentialityDevelopment I'm wondering about this part:

        with safetensors.safe_open(model_file, framework="pt") as f:
            for layer in f.keys():
                layers.append({
                    "name" : layer,
                    "file" : model_file
                })

Are you sure that the source model has all layers in the correct order that is expected by Distributed Llama?

DId not check yet, will do a full convert on llama-3 8B Instruct, do a test with distributed llama and report back.

DifferentialityDevelopment · 2024-05-15T08:24:51Z

The convert process itself does seem to work fine, but will test once it finishes

python converter/convert-llama.py J:\Llama-3\Meta-Llama-3-8B-Instruct J:\Llama-3\Meta-Llama-3-8B-Instruct-Distributed q40
Model name: Meta-Llama-3-8B-Instruct
Target float type: q40
Target file: dllama_meta-llama-3-8b-instruct_q40.bin
Total layers: 291
Total chunks: 7
Unknown header key: head_size
{'head_size': 128.0, 'n_layers': 32, 'n_heads': 32, 'n_kv_heads': 8, 'max_seq_len': 8192, 'rope_theta': 500000, 'arch_type': 11259136, 'n_experts': 0, 'n_active_experts': 0}
💿 Chunking model 1/7...
Loading tensors for model.embed_tokens.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.embed_tokens.weight torch.Size([128256, 4096])...
Saved q40 tensor in 123.95s, 295501824 bytes
Loading tensors for model.layers.0.input_layernorm.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.input_layernorm.weight torch.Size([4096])...
Saved q40 tensor in 0.00s, 2304 bytes
Loading tensors for model.layers.0.mlp.down_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.mlp.down_proj.weight torch.Size([4096, 14336])...
Saved q40 tensor in 14.69s, 33030144 bytes
Loading tensors for model.layers.0.mlp.gate_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.mlp.gate_proj.weight torch.Size([14336, 4096])...
Saved q40 tensor in 14.96s, 33030144 bytes
Loading tensors for model.layers.0.mlp.up_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.mlp.up_proj.weight torch.Size([14336, 4096])...
Saved q40 tensor in 14.95s, 33030144 bytes
Loading tensors for model.layers.0.post_attention_layernorm.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.post_attention_layernorm.weight torch.Size([4096])...
Saved q40 tensor in 0.00s, 2304 bytes
Loading tensors for model.layers.0.self_attn.k_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.self_attn.k_proj.weight torch.Size([1024, 4096])...
Saved q40 tensor in 1.08s, 2359296 bytes
Loading tensors for model.layers.0.self_attn.o_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.self_attn.o_proj.weight torch.Size([4096, 4096])...
Saved q40 tensor in 4.37s, 9437184 bytes
Loading tensors for model.layers.0.self_attn.q_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.self_attn.q_proj.weight torch.Size([4096, 4096])...
Saved q40 tensor in 4.27s, 9437184 bytes
Loading tensors for model.layers.0.self_attn.v_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.0.self_attn.v_proj.weight torch.Size([1024, 4096])...
Saved q40 tensor in 1.05s, 2359296 bytes
Loading tensors for model.layers.1.input_layernorm.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.1.input_layernorm.weight torch.Size([4096])...
Saved q40 tensor in 0.00s, 2304 bytes
Loading tensors for model.layers.1.mlp.down_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.1.mlp.down_proj.weight torch.Size([4096, 14336])...
Saved q40 tensor in 14.91s, 33030144 bytes
Loading tensors for model.layers.1.mlp.gate_proj.weight from: model-00001-of-00004.safetensors
🔶 Exporting model.layers.1.mlp.gate_proj.weight torch.Size([14336, 4096])...
Saved q40 tensor in 14.76s, 33030144 bytes

b4rtaz · 2024-05-15T08:26:28Z

Please consider also that, some models may have a different layers order by some reason.

DifferentialityDevelopment · 2024-05-15T09:59:20Z

Please consider also that, some models may have a different layers order by some reason.

I would think the order of the keys when loading a .safetensor model is the same as from the .pth file but I could be wrong, will do a bit of research.

DifferentialityDevelopment · 2024-05-15T12:18:39Z

Your absolutely right, the layers are not necessarily in the right order, see below output of their keys, I noticed that layer 9 only appears after layer 20.
So I will need to fix the ordering.
I'm not entirely sure where to place lm_head.weight and model.norm.weight, they appear near the end of the list.
The other thing I have trouble with is that I'm not sure which of the layers is the feed_forward layer?
Which is what you use in the pth conversion to get the hidden_dim size

Additionally they have a different name convention, so I had to change a few more things.
Is this correct:
[safetensor] model.embed_tokens.weight -> [pth] tok_embeddings.weight
[safetensor] model.layers.0.mlp.gate_proj.weight -> [pth] layers.0.feed_forward.w1.weight
[safetensor] model.layers.0.mlp.up_proj.weight -> [pth] layers.0.feed_forward.w2.weight
[safetensor] model.layers.0.post_attention_layernorm.weight -> [pth] layers.0.attention_norm.weight
[safetensor] model.norm.weight -> [pth] norm.weight

Keys:

model.embed_tokens.weight => 128256
model.layers.0.input_layernorm.weight => 4096
model.layers.0.mlp.down_proj.weight => 4096
model.layers.0.mlp.gate_proj.weight => 14336
model.layers.0.mlp.up_proj.weight => 14336
model.layers.0.post_attention_layernorm.weight => 4096
model.layers.0.self_attn.k_proj.weight => 1024
model.layers.0.self_attn.o_proj.weight => 4096
model.layers.0.self_attn.q_proj.weight => 4096
model.layers.0.self_attn.v_proj.weight => 1024
model.layers.1.input_layernorm.weight => 4096
model.layers.1.mlp.down_proj.weight => 4096
model.layers.1.mlp.gate_proj.weight => 14336
model.layers.1.mlp.up_proj.weight => 14336
model.layers.1.post_attention_layernorm.weight => 4096
model.layers.1.self_attn.k_proj.weight => 1024
model.layers.1.self_attn.o_proj.weight => 4096
model.layers.1.self_attn.q_proj.weight => 4096
model.layers.1.self_attn.v_proj.weight => 1024
model.layers.2.input_layernorm.weight => 4096
model.layers.2.mlp.down_proj.weight => 4096
model.layers.2.mlp.gate_proj.weight => 14336
model.layers.2.mlp.up_proj.weight => 14336
model.layers.2.post_attention_layernorm.weight => 4096
model.layers.2.self_attn.k_proj.weight => 1024
model.layers.2.self_attn.o_proj.weight => 4096
model.layers.2.self_attn.q_proj.weight => 4096
model.layers.2.self_attn.v_proj.weight => 1024
model.layers.3.input_layernorm.weight => 4096
model.layers.3.mlp.down_proj.weight => 4096
model.layers.3.mlp.gate_proj.weight => 14336
model.layers.3.mlp.up_proj.weight => 14336
model.layers.3.post_attention_layernorm.weight => 4096
model.layers.3.self_attn.k_proj.weight => 1024
model.layers.3.self_attn.o_proj.weight => 4096
model.layers.3.self_attn.q_proj.weight => 4096
model.layers.3.self_attn.v_proj.weight => 1024
model.layers.4.input_layernorm.weight => 4096
model.layers.4.mlp.down_proj.weight => 4096
model.layers.4.mlp.gate_proj.weight => 14336
model.layers.4.mlp.up_proj.weight => 14336
model.layers.4.post_attention_layernorm.weight => 4096
model.layers.4.self_attn.k_proj.weight => 1024
model.layers.4.self_attn.o_proj.weight => 4096
model.layers.4.self_attn.q_proj.weight => 4096
model.layers.4.self_attn.v_proj.weight => 1024
model.layers.5.input_layernorm.weight => 4096
model.layers.5.mlp.down_proj.weight => 4096
model.layers.5.mlp.gate_proj.weight => 14336
model.layers.5.mlp.up_proj.weight => 14336
model.layers.5.post_attention_layernorm.weight => 4096
model.layers.5.self_attn.k_proj.weight => 1024
model.layers.5.self_attn.o_proj.weight => 4096
model.layers.5.self_attn.q_proj.weight => 4096
model.layers.5.self_attn.v_proj.weight => 1024
model.layers.6.input_layernorm.weight => 4096
model.layers.6.mlp.down_proj.weight => 4096
model.layers.6.mlp.gate_proj.weight => 14336
model.layers.6.mlp.up_proj.weight => 14336
model.layers.6.post_attention_layernorm.weight => 4096
model.layers.6.self_attn.k_proj.weight => 1024
model.layers.6.self_attn.o_proj.weight => 4096
model.layers.6.self_attn.q_proj.weight => 4096
model.layers.6.self_attn.v_proj.weight => 1024
model.layers.7.input_layernorm.weight => 4096
model.layers.7.mlp.down_proj.weight => 4096
model.layers.7.mlp.gate_proj.weight => 14336
model.layers.7.mlp.up_proj.weight => 14336
model.layers.7.post_attention_layernorm.weight => 4096
model.layers.7.self_attn.k_proj.weight => 1024
model.layers.7.self_attn.o_proj.weight => 4096
model.layers.7.self_attn.q_proj.weight => 4096
model.layers.7.self_attn.v_proj.weight => 1024
model.layers.8.input_layernorm.weight => 4096
model.layers.8.mlp.down_proj.weight => 4096
model.layers.8.mlp.gate_proj.weight => 14336
model.layers.8.mlp.up_proj.weight => 14336
model.layers.8.post_attention_layernorm.weight => 4096
model.layers.8.self_attn.k_proj.weight => 1024
model.layers.8.self_attn.o_proj.weight => 4096
model.layers.8.self_attn.q_proj.weight => 4096
model.layers.8.self_attn.v_proj.weight => 1024
model.layers.10.input_layernorm.weight => 4096
model.layers.10.mlp.down_proj.weight => 4096
model.layers.10.mlp.gate_proj.weight => 14336
model.layers.10.mlp.up_proj.weight => 14336
model.layers.10.post_attention_layernorm.weight => 4096
model.layers.10.self_attn.k_proj.weight => 1024
model.layers.10.self_attn.o_proj.weight => 4096
model.layers.10.self_attn.q_proj.weight => 4096
model.layers.10.self_attn.v_proj.weight => 1024
model.layers.11.input_layernorm.weight => 4096
model.layers.11.mlp.down_proj.weight => 4096
model.layers.11.mlp.gate_proj.weight => 14336
model.layers.11.mlp.up_proj.weight => 14336
model.layers.11.post_attention_layernorm.weight => 4096
model.layers.11.self_attn.k_proj.weight => 1024
model.layers.11.self_attn.o_proj.weight => 4096
model.layers.11.self_attn.q_proj.weight => 4096
model.layers.11.self_attn.v_proj.weight => 1024
model.layers.12.input_layernorm.weight => 4096
model.layers.12.mlp.down_proj.weight => 4096
model.layers.12.mlp.gate_proj.weight => 14336
model.layers.12.mlp.up_proj.weight => 14336
model.layers.12.post_attention_layernorm.weight => 4096
model.layers.12.self_attn.k_proj.weight => 1024
model.layers.12.self_attn.o_proj.weight => 4096
model.layers.12.self_attn.q_proj.weight => 4096
model.layers.12.self_attn.v_proj.weight => 1024
model.layers.13.input_layernorm.weight => 4096
model.layers.13.mlp.down_proj.weight => 4096
model.layers.13.mlp.gate_proj.weight => 14336
model.layers.13.mlp.up_proj.weight => 14336
model.layers.13.post_attention_layernorm.weight => 4096
model.layers.13.self_attn.k_proj.weight => 1024
model.layers.13.self_attn.o_proj.weight => 4096
model.layers.13.self_attn.q_proj.weight => 4096
model.layers.13.self_attn.v_proj.weight => 1024
model.layers.14.input_layernorm.weight => 4096
model.layers.14.mlp.down_proj.weight => 4096
model.layers.14.mlp.gate_proj.weight => 14336
model.layers.14.mlp.up_proj.weight => 14336
model.layers.14.post_attention_layernorm.weight => 4096
model.layers.14.self_attn.k_proj.weight => 1024
model.layers.14.self_attn.o_proj.weight => 4096
model.layers.14.self_attn.q_proj.weight => 4096
model.layers.14.self_attn.v_proj.weight => 1024
model.layers.15.input_layernorm.weight => 4096
model.layers.15.mlp.down_proj.weight => 4096
model.layers.15.mlp.gate_proj.weight => 14336
model.layers.15.mlp.up_proj.weight => 14336
model.layers.15.post_attention_layernorm.weight => 4096
model.layers.15.self_attn.k_proj.weight => 1024
model.layers.15.self_attn.o_proj.weight => 4096
model.layers.15.self_attn.q_proj.weight => 4096
model.layers.15.self_attn.v_proj.weight => 1024
model.layers.16.input_layernorm.weight => 4096
model.layers.16.mlp.down_proj.weight => 4096
model.layers.16.mlp.gate_proj.weight => 14336
model.layers.16.mlp.up_proj.weight => 14336
model.layers.16.post_attention_layernorm.weight => 4096
model.layers.16.self_attn.k_proj.weight => 1024
model.layers.16.self_attn.o_proj.weight => 4096
model.layers.16.self_attn.q_proj.weight => 4096
model.layers.16.self_attn.v_proj.weight => 1024
model.layers.17.input_layernorm.weight => 4096
model.layers.17.mlp.down_proj.weight => 4096
model.layers.17.mlp.gate_proj.weight => 14336
model.layers.17.mlp.up_proj.weight => 14336
model.layers.17.post_attention_layernorm.weight => 4096
model.layers.17.self_attn.k_proj.weight => 1024
model.layers.17.self_attn.o_proj.weight => 4096
model.layers.17.self_attn.q_proj.weight => 4096
model.layers.17.self_attn.v_proj.weight => 1024
model.layers.18.input_layernorm.weight => 4096
model.layers.18.mlp.down_proj.weight => 4096
model.layers.18.mlp.gate_proj.weight => 14336
model.layers.18.mlp.up_proj.weight => 14336
model.layers.18.post_attention_layernorm.weight => 4096
model.layers.18.self_attn.k_proj.weight => 1024
model.layers.18.self_attn.o_proj.weight => 4096
model.layers.18.self_attn.q_proj.weight => 4096
model.layers.18.self_attn.v_proj.weight => 1024
model.layers.19.input_layernorm.weight => 4096
model.layers.19.mlp.down_proj.weight => 4096
model.layers.19.mlp.gate_proj.weight => 14336
model.layers.19.mlp.up_proj.weight => 14336
model.layers.19.post_attention_layernorm.weight => 4096
model.layers.19.self_attn.k_proj.weight => 1024
model.layers.19.self_attn.o_proj.weight => 4096
model.layers.19.self_attn.q_proj.weight => 4096
model.layers.19.self_attn.v_proj.weight => 1024
model.layers.20.mlp.gate_proj.weight => 14336
model.layers.20.self_attn.k_proj.weight => 1024
model.layers.20.self_attn.o_proj.weight => 4096
model.layers.20.self_attn.q_proj.weight => 4096
model.layers.20.self_attn.v_proj.weight => 1024
model.layers.9.input_layernorm.weight => 4096
model.layers.9.mlp.down_proj.weight => 4096
model.layers.9.mlp.gate_proj.weight => 14336
model.layers.9.mlp.up_proj.weight => 14336
model.layers.9.post_attention_layernorm.weight => 4096
model.layers.9.self_attn.k_proj.weight => 1024
model.layers.9.self_attn.o_proj.weight => 4096
model.layers.9.self_attn.q_proj.weight => 4096
model.layers.9.self_attn.v_proj.weight => 1024
model.layers.20.input_layernorm.weight => 4096
model.layers.20.mlp.down_proj.weight => 4096
model.layers.20.mlp.up_proj.weight => 14336
model.layers.20.post_attention_layernorm.weight => 4096
model.layers.21.input_layernorm.weight => 4096
model.layers.21.mlp.down_proj.weight => 4096
model.layers.21.mlp.gate_proj.weight => 14336
model.layers.21.mlp.up_proj.weight => 14336
model.layers.21.post_attention_layernorm.weight => 4096
model.layers.21.self_attn.k_proj.weight => 1024
model.layers.21.self_attn.o_proj.weight => 4096
model.layers.21.self_attn.q_proj.weight => 4096
model.layers.21.self_attn.v_proj.weight => 1024
model.layers.22.input_layernorm.weight => 4096
model.layers.22.mlp.down_proj.weight => 4096
model.layers.22.mlp.gate_proj.weight => 14336
model.layers.22.mlp.up_proj.weight => 14336
model.layers.22.post_attention_layernorm.weight => 4096
model.layers.22.self_attn.k_proj.weight => 1024
model.layers.22.self_attn.o_proj.weight => 4096
model.layers.22.self_attn.q_proj.weight => 4096
model.layers.22.self_attn.v_proj.weight => 1024
model.layers.23.input_layernorm.weight => 4096
model.layers.23.mlp.down_proj.weight => 4096
model.layers.23.mlp.gate_proj.weight => 14336
model.layers.23.mlp.up_proj.weight => 14336
model.layers.23.post_attention_layernorm.weight => 4096
model.layers.23.self_attn.k_proj.weight => 1024
model.layers.23.self_attn.o_proj.weight => 4096
model.layers.23.self_attn.q_proj.weight => 4096
model.layers.23.self_attn.v_proj.weight => 1024
model.layers.24.input_layernorm.weight => 4096
model.layers.24.mlp.down_proj.weight => 4096
model.layers.24.mlp.gate_proj.weight => 14336
model.layers.24.mlp.up_proj.weight => 14336
model.layers.24.post_attention_layernorm.weight => 4096
model.layers.24.self_attn.k_proj.weight => 1024
model.layers.24.self_attn.o_proj.weight => 4096
model.layers.24.self_attn.q_proj.weight => 4096
model.layers.24.self_attn.v_proj.weight => 1024
model.layers.25.input_layernorm.weight => 4096
model.layers.25.mlp.down_proj.weight => 4096
model.layers.25.mlp.gate_proj.weight => 14336
model.layers.25.mlp.up_proj.weight => 14336
model.layers.25.post_attention_layernorm.weight => 4096
model.layers.25.self_attn.k_proj.weight => 1024
model.layers.25.self_attn.o_proj.weight => 4096
model.layers.25.self_attn.q_proj.weight => 4096
model.layers.25.self_attn.v_proj.weight => 1024
model.layers.26.input_layernorm.weight => 4096
model.layers.26.mlp.down_proj.weight => 4096
model.layers.26.mlp.gate_proj.weight => 14336
model.layers.26.mlp.up_proj.weight => 14336
model.layers.26.post_attention_layernorm.weight => 4096
model.layers.26.self_attn.k_proj.weight => 1024
model.layers.26.self_attn.o_proj.weight => 4096
model.layers.26.self_attn.q_proj.weight => 4096
model.layers.26.self_attn.v_proj.weight => 1024
model.layers.27.input_layernorm.weight => 4096
model.layers.27.mlp.down_proj.weight => 4096
model.layers.27.mlp.gate_proj.weight => 14336
model.layers.27.mlp.up_proj.weight => 14336
model.layers.27.post_attention_layernorm.weight => 4096
model.layers.27.self_attn.k_proj.weight => 1024
model.layers.27.self_attn.o_proj.weight => 4096
model.layers.27.self_attn.q_proj.weight => 4096
model.layers.27.self_attn.v_proj.weight => 1024
model.layers.28.input_layernorm.weight => 4096
model.layers.28.mlp.down_proj.weight => 4096
model.layers.28.mlp.gate_proj.weight => 14336
model.layers.28.mlp.up_proj.weight => 14336
model.layers.28.post_attention_layernorm.weight => 4096
model.layers.28.self_attn.k_proj.weight => 1024
model.layers.28.self_attn.o_proj.weight => 4096
model.layers.28.self_attn.q_proj.weight => 4096
model.layers.28.self_attn.v_proj.weight => 1024
model.layers.29.input_layernorm.weight => 4096
model.layers.29.mlp.down_proj.weight => 4096
model.layers.29.mlp.gate_proj.weight => 14336
model.layers.29.mlp.up_proj.weight => 14336
model.layers.29.post_attention_layernorm.weight => 4096
model.layers.29.self_attn.k_proj.weight => 1024
model.layers.29.self_attn.o_proj.weight => 4096
model.layers.29.self_attn.q_proj.weight => 4096
model.layers.29.self_attn.v_proj.weight => 1024
model.layers.30.input_layernorm.weight => 4096
model.layers.30.mlp.down_proj.weight => 4096
model.layers.30.mlp.gate_proj.weight => 14336
model.layers.30.mlp.up_proj.weight => 14336
model.layers.30.post_attention_layernorm.weight => 4096
model.layers.30.self_attn.k_proj.weight => 1024
model.layers.30.self_attn.o_proj.weight => 4096
model.layers.30.self_attn.q_proj.weight => 4096
model.layers.30.self_attn.v_proj.weight => 1024
model.layers.31.mlp.gate_proj.weight => 14336
model.layers.31.mlp.up_proj.weight => 14336
model.layers.31.self_attn.k_proj.weight => 1024
model.layers.31.self_attn.o_proj.weight => 4096
model.layers.31.self_attn.q_proj.weight => 4096
model.layers.31.self_attn.v_proj.weight => 1024
lm_head.weight => 128256
model.layers.31.input_layernorm.weight => 4096
model.layers.31.mlp.down_proj.weight => 4096
model.layers.31.post_attention_layernorm.weight => 4096
model.norm.weight => 4096

b4rtaz · 2024-05-15T12:20:59Z

I reccomend to use the same appraoch as you can see in the convert_pth method. You should build a list with layer names, then you need to pass it to the loop. BTW: this loop could be extracted from these two functions.

b4rtaz · 2024-05-24T20:47:33Z

@DifferentialityDevelopment I'm closing this pull request. The convert-hf.py script introduced in the 0.7.0 version supports the safe tensor format and 3 model types.

DifferentialityDevelopment added 2 commits May 14, 2024 14:32

Add safe tensor support to convert-llama.py

a8f2734

Update convert-llama.py

545cdd9

b4rtaz reviewed May 14, 2024

View reviewed changes

Wouter Tichelaar added 2 commits May 14, 2024 19:03

Updated docs and better way of getting model name from input path

4562e20

Fixed type

095adc4

Fixed up some params missing

7b8e90a

b4rtaz closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add safe tensor support to convert-llama.py #52

Add safe tensor support to convert-llama.py #52

DifferentialityDevelopment commented May 14, 2024 •

edited

b4rtaz commented May 14, 2024

b4rtaz May 14, 2024

DifferentialityDevelopment commented May 14, 2024

b4rtaz commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

b4rtaz commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024 •

edited

b4rtaz commented May 15, 2024

b4rtaz commented May 24, 2024

Add safe tensor support to convert-llama.py #52

Add safe tensor support to convert-llama.py #52

Conversation

DifferentialityDevelopment commented May 14, 2024 • edited

b4rtaz commented May 14, 2024

b4rtaz May 14, 2024

Choose a reason for hiding this comment

DifferentialityDevelopment commented May 14, 2024

b4rtaz commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

b4rtaz commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024

DifferentialityDevelopment commented May 15, 2024 • edited

b4rtaz commented May 15, 2024

b4rtaz commented May 24, 2024

DifferentialityDevelopment commented May 14, 2024 •

edited

DifferentialityDevelopment commented May 15, 2024 •

edited