You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run llama-7b-relu.q4.powerinfer.gguf with the following command: PowerInfer/build/bin/main -m ReluLLaMA-7B-PowerInfer-GGUF/llama-7b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "$PROMPT" -c 1000 --n-gpu-layers 20
the prompt size is ~ 800 tokens. when I run this I get:
not enough space in the buffer (needed 1409024, largest block available 1389584)
GGML_ASSERT: /home/user_name/PowerInfer/ggml-alloc.c:116: !"not enough space in the buffer"
[Thread debugging using libthread_db enabled]
when I use small prompt size (10-20 tokens in the prompt) it works as expected. any ideas on how to solve this?
The text was updated successfully, but these errors were encountered:
Hi @RachelShalom can you test it again on the latest main branch? I have tested with the same model with a prompt for about 1.9K tokens and 2048 context window size, and it looks good.
If the error persists, would you mind posting the entire error log? It will help us to pinpoint the root cause.
Prerequisites
Before submitting your question, please ensure the following:
Question Details
I am trying to run llama-7b-relu.q4.powerinfer.gguf with the following command:
PowerInfer/build/bin/main -m ReluLLaMA-7B-PowerInfer-GGUF/llama-7b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "$PROMPT" -c 1000 --n-gpu-layers 20
the prompt size is ~ 800 tokens. when I run this I get:
when I use small prompt size (10-20 tokens in the prompt) it works as expected. any ideas on how to solve this?
The text was updated successfully, but these errors were encountered: