Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Add Support for 5-bit quantized models #84

Open
TreesPlay opened this issue May 8, 2023 · 2 comments
Open

[ENHANCEMENT] Add Support for 5-bit quantized models #84

TreesPlay opened this issue May 8, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@TreesPlay
Copy link

Hi, I don't know much about AI. But I've seen a lot models popping up on HuggingFace recently advertising 5-bit quantisation. Here is an example: https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML

I can only load q4_0 and q4_1 models. The newer q4_2, q5_0 and q5_1 don't work. Since I recently upgraded my RAM to 64GB to run LLMs on my machine I'd like to be able to use the newer models.

@TreesPlay TreesPlay added the enhancement New feature or request label May 8, 2023
@TreesPlay
Copy link
Author

For context I use the latest release. Since it was last updated a month ago I don't know if the latest commits already added support for 5-bit quantisation.

@chmodseven
Copy link

I have been using some q5_1 models with no problems after compiling llama.cpp and putting the resulting main.exe in place of Alpaca Electron's chat.exe. You can follow "(OPTIONAL) Building llama.cpp from source" in the README here, although note that for me the second cmake didn't work and should be "cmake --build . --config Release" per the llama.cpp README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants