Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load 70B llama2 on cpu (llama cpp) #66

Open
Dougie777 opened this issue Sep 4, 2023 · 1 comment
Open

Unable to load 70B llama2 on cpu (llama cpp) #66

Dougie777 opened this issue Sep 4, 2023 · 1 comment

Comments

@Dougie777
Copy link

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

The exact same settings and quantization works for 7B and 13B. Here is my .env

MODEL_PATH = ""

if MODEL_PATH is "", default llama.cpp/gptq models

will be downloaded to: ./models

Example ggml path:

#MODEL_PATH = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"
MODEL_PATH = "./models/llama-2-70b-chat.ggmlv3.q4_0.bin"
#MODEL_PATH = "./models/llama-2-13b-chat.ggmlv3.q4_0.bin"

options: llama.cpp, gptq, transformers

BACKEND_TYPE = "llama.cpp"

only for transformers bitsandbytes 8 bit

LOAD_IN_8BIT = False

MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DEFAULT_SYSTEM_PROMPT = ""

@liltom-eth
Copy link
Owner

@Dougie777 the env looks good to me. might be the error from 70b model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants