You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running this on Mac M1 16GB RAM using app.py for simple text generation. Using the llama.cpp from terminal is much faster but when I use the backend through app.py is very slow. Any ideas?
The text was updated successfully, but these errors were encountered:
Hello ! I was in the same situation and found the solution :
Fist, check if your python env is configured to be arm64 and not x86 : python -c "import platform; print(platform.platform())"
it shoud return : macOS-14.2.1-arm64-arm-64bit
If it's not, you need to create a new env (i'm using Conda): CONDA_SUBDIR=osx-arm64 conda create -n your_env python=the_version_you_want
You clone the github repo and install the package llama2-wraper: python -m pip install llama2-wrapper
And then you reinstall the llama-ccp-python package for arm64: python -m pip uninstall llama-cpp-python -y CMAKE_ARGS="-DLLAMA_METAL=on" pip install -U llama-cpp-python --no-cache-dir python -m pip install 'llama-cpp-python[server]'
When it's done, you need to modify the file "~/llama2-webui/llama2_wrapper/model.py":
in the function (line 118), you need to add the param "n_gpu_layers=-1" :
@classmethoddefcreate_llama2_model(
cls, model_path, backend_type, max_tokens, load_in_8bit, verbose
):
ifbackend_typeisBackendType.LLAMA_CPP:
fromllama_cppimportLlamamodel=Llama(
model_path=model_path,
n_ctx=max_tokens,
n_batch=max_tokens,
verbose=verbose,
n_gpu_layers=-1# I added this line to force the model to run on GPU ARM
Profit ! It should be fast to generate content (on Macbook pro M1 pro 16G memory, It went to 1 token every 2 seconds to 10 tokens per second !
I am running this on Mac M1 16GB RAM using
app.py
for simple text generation. Using thellama.cpp
from terminal is much faster but when I use the backend throughapp.py
is very slow. Any ideas?The text was updated successfully, but these errors were encountered: