Very slow generation #83

jaslatendresse · 2023-12-13T18:19:32Z

I am running this on Mac M1 16GB RAM using app.py for simple text generation. Using the llama.cpp from terminal is much faster but when I use the backend through app.py is very slow. Any ideas?

The text was updated successfully, but these errors were encountered:

arnaudberenbaum · 2023-12-30T12:07:57Z

Hello ! I was in the same situation and found the solution :

Fist, check if your python env is configured to be arm64 and not x86 :
python -c "import platform; print(platform.platform())"
it shoud return :
macOS-14.2.1-arm64-arm-64bit
If it's not, you need to create a new env (i'm using Conda):
CONDA_SUBDIR=osx-arm64 conda create -n your_env python=the_version_you_want
You clone the github repo and install the package llama2-wraper:
python -m pip install llama2-wrapper
And then you reinstall the llama-ccp-python package for arm64:
python -m pip uninstall llama-cpp-python -y
CMAKE_ARGS="-DLLAMA_METAL=on" pip install -U llama-cpp-python --no-cache-dir
python -m pip install 'llama-cpp-python[server]'
When it's done, you need to modify the file "~/llama2-webui/llama2_wrapper/model.py":

in the function (line 118), you need to add the param "n_gpu_layers=-1" :

@classmethod
    def create_llama2_model(
        cls, model_path, backend_type, max_tokens, load_in_8bit, verbose
    ):
        if backend_type is BackendType.LLAMA_CPP:
            from llama_cpp import Llama
            model = Llama(
                model_path=model_path,
                n_ctx=max_tokens,
                n_batch=max_tokens,
                verbose=verbose,
                n_gpu_layers=-1 # I added this line to force the model to run on GPU ARM

Profit ! It should be fast to generate content (on Macbook pro M1 pro 16G memory, It went to 1 token every 2 seconds to 10 tokens per second !

Hope it helped ! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow generation #83

Very slow generation #83

jaslatendresse commented Dec 13, 2023

arnaudberenbaum commented Dec 30, 2023

Very slow generation #83

Very slow generation #83

Comments

jaslatendresse commented Dec 13, 2023

arnaudberenbaum commented Dec 30, 2023