-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenELM support #7359
base: master
Are you sure you want to change the base?
OpenELM support #7359
Conversation
Fix formatting
It looks like context shift currently causes crashes, because A few other functions seem like they will be broken as well. |
We already have this logic for the Refact models: llama.cpp/convert-hf-to-gguf.py Lines 1141 to 1143 in 8513724
You can try to reuse it in a similar way for OpenELM
Have you ran
We'll probably need to generalize the head number to be determined per layer. Do you need to some assistance with that? |
@icecream95 might jump back on this cause im curious to where i got stuck |
Fixes: #6868.
Thanks to @joshcarp for an initial try at doing this (#6986), it was very helpful as a source to copy-paste from and check against.
Currently a bunch of the configuration is hardcoded into
llama.cpp
, so only the 270M model works at this point.The
ffn_up
tensors in the converted model are actually concatenations offfn_gate
andffn_up
, perhaps the conversion script should separate them out?The 270M model is impressively fast, and works fine for generation, but "Chat" mode in
./server
doesn't really work well. Perhaps that's just because it hasn't been finetuned for that? I'm not really sure.