Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Zephyr and other "StableLmForCausalLM" models? #1649

Open
BBC-Esq opened this issue Mar 26, 2024 · 2 comments
Open

Support for Zephyr and other "StableLmForCausalLM" models? #1649

BBC-Esq opened this issue Mar 26, 2024 · 2 comments

Comments

@BBC-Esq
Copy link

BBC-Esq commented Mar 26, 2024

Any plans to support conversion of ```StableLmForCausalLM" models? I've noticed that they're very good; for example the new Zephyr model here:

https://huggingface.co/stabilityai/stablelm-zephyr-3b

Amazing performance for a 3B model, much better than Phi2 IMHO. Support was added into Transformers in version 4.38.2:

https://github.com/huggingface/transformers/releases/tag/v4.38.0

Here's the link to a description of the model architecture to help:

https://huggingface.co/docs/transformers/v4.38.2/en/model_doc/stablelm

@BBC-Esq BBC-Esq changed the title Support for Zephr and other "StableLmForCausalLM" models? Support for Zephyr and other "StableLmForCausalLM" models? Apr 22, 2024
@BBC-Esq
Copy link
Author

BBC-Esq commented Apr 22, 2024

Here is yet another badass model @minhthuc2502 . Would love to help create a converter but am not an expert. It's the 1.6b version of Zephyr:

https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b

It kicks ass for its size. The only other small models with a context size of over 4,000 is gemma, which, at least in my testing, royally sucks. (referring to Gemma 2b, newest version 1.1 included).

Currently, the only reasonable option to build a chat application with ctranslate2 that uses a model smaller than 7b requires using gemma. I use the term "reasonable" because the phi converter is currently broke due to changes in phi2, and, at any rate, phi2 only has a context of 2048.

Zephyr 3b and Zephyr 1.6b are the best in their class, way better than gemma 2b. Other viable options are creating a converter for Qwen, which has a .5B model, actually.

Here are tests for gemma and others on a basic RAG question. gemma 2b only got half the question right no matter how many beams I used. HOWEVER, even the zephyr 1.6B model gave a 100% correct answer at beam size of 1.

In short, gemma 2b is fast, but sucks, while zephyr is only slightly less fast, but IS ABSOLUTELY AWESOME.

NOTE: The models in the legend with "ct2" in their name are obviously ctranslate2 models. The other models were tested using transformers along with bitsandbytes (using 4-bit), just FYI.

Lastly, llama.cpp already supports zephyr, qwen and others, but I'd rather not switch due to the additional dependency...Let me know @minhthuc2502 if you'll reconsider making this a higher priority. I know you're busy...thanks dude.

image

@BBC-Esq
Copy link
Author

BBC-Esq commented Apr 22, 2024

To maybe save you a few minutes..I've gathered the following information for someone/anyone:

  1. The config.json states that the architecture is "StableLmForCausalLM"

  2. I think this is it https://huggingface.co/docs/transformers/v4.40.0/en/model_doc/stablelm

  3. Additional info: https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo

Based on this snippet, hopefully it wouldn't be too complicated to create a converter for it...

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant