LlamaTerm is a simple CLI utility that allows to use local LLM models easily and with some additional features.
⚠️ Currently this project supports models that use ChatML format or something similar. Use for example Phi-3-mini and LLama3 GGUFs.
- Give local files to the model using square brackets
User: Can you explain the code in [helloworld.c] please?
- More coming soon
You can setup LLamaTerm by:
- Rename
example.env
to.env
- Modify
.env
so that the model path corresponds (you may also need to editEOS
andPREFIX_TEMPLATE
) - If you need syntax highlighting for code and markdown, then set
REAL_TIME=0
in the.env
. Note that you will lose real time output generation. - Install python dependencies with
pip install -r requirements.txt
- If you have a CUDA GPU: install with cuBLAS acceleration:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install --upgrade --force-reinstall --no-cache llama-cpp-python
- If you have an AMD GPU: install with HIP acceleration
CMAKE_ARGS="-DLLAMA_HIPBLAS=on -DAMDGPU_TARGETS=insert gpu arch or compatible arch" FORCE_CMAKE=1 CXX=/opt/rocm/bin/hipcc pip install llama-cpp-python --upgrade --force-reinstall --no-cache
. For more info see llama-cpp-python
Run LlamaTerm by adding the project directory to the PATH
and then running llamaterm
For the following models you will just need to rename the corresponding example example-*.env
file to .env
:
- LLama3 8B Instruct [RECOMMENDED]
- Phi 3 Mini Instruct [RECOMMENDED]
- OpenHermes 2.5 Mistral 7B GGUF
- Zephyr Beta 7B GGUF
All the other models that have a prompt template similar to ChatML are supported but you will need to customize some fields like PREFIX_TEMPLATE
, EOS
etc... in the .env
.