Unable to load 70B llama2 on cpu (llama cpp) #66

Dougie777 · 2023-09-04T04:04:15Z

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

The exact same settings and quantization works for 7B and 13B. Here is my .env

MODEL_PATH = ""

if MODEL_PATH is "", default llama.cpp/gptq models

will be downloaded to: ./models

Example ggml path:

#MODEL_PATH = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"
MODEL_PATH = "./models/llama-2-70b-chat.ggmlv3.q4_0.bin"
#MODEL_PATH = "./models/llama-2-13b-chat.ggmlv3.q4_0.bin"

options: llama.cpp, gptq, transformers

BACKEND_TYPE = "llama.cpp"

only for transformers bitsandbytes 8 bit

LOAD_IN_8BIT = False

MAX_MAX_NEW_TOKENS = 2048
DEFAULT_MAX_NEW_TOKENS = 1024
MAX_INPUT_TOKEN_LENGTH = 4000

DEFAULT_SYSTEM_PROMPT = ""

liltom-eth · 2023-10-04T05:40:18Z

@Dougie777 the env looks good to me. might be the error from 70b model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load 70B llama2 on cpu (llama cpp) #66

Unable to load 70B llama2 on cpu (llama cpp) #66

Dougie777 commented Sep 4, 2023

liltom-eth commented Oct 4, 2023

Unable to load 70B llama2 on cpu (llama cpp) #66

Unable to load 70B llama2 on cpu (llama cpp) #66

Comments

Dougie777 commented Sep 4, 2023

if MODEL_PATH is "", default llama.cpp/gptq models

will be downloaded to: ./models

Example ggml path:

options: llama.cpp, gptq, transformers

only for transformers bitsandbytes 8 bit

liltom-eth commented Oct 4, 2023