Flash Attention and native PyTorch weights #5

shaltielshmid · 2024-02-10T22:05:18Z

I saw on the TODO list Flash Attention, so I wanted to bring to your attention the announcement here.

Two packages were announced there:

1] Loading model weights saved using the PyTorch format / safetensors format (including handling for HuggingFace's sharding)

2] Flash Attention - self explanatory :)

shaltielshmid · 2024-02-11T00:17:20Z

Also, I saw in one of the python scripts that you rename some of the weights to match the naming scheme between HuggingFace and llm-sharp. There is a useful attribute that you can add on any field to specify the name you want torch to store it as:

[ComponentName("some_name")]
private Module _someOtherName;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention and native PyTorch weights #5

Flash Attention and native PyTorch weights #5

shaltielshmid commented Feb 10, 2024

shaltielshmid commented Feb 11, 2024

Flash Attention and native PyTorch weights #5

Flash Attention and native PyTorch weights #5

Comments

shaltielshmid commented Feb 10, 2024

shaltielshmid commented Feb 11, 2024