Skip to content

krasserm/grammar-based-agents

Repository files navigation

Modular open LLM agents via prompt chaining and schema-guided generation

Work in progress

Previous work

Getting started

mkdir -p models

wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q8_0.gguf?download=true \
  -O models/mistral-7b-instruct-v0.2.Q8_0.gguf

wget https://huggingface.co/TheBloke/NexusRaven-V2-13B-GGUF/resolve/main/nexusraven-v2-13b.Q8_0.gguf?download=true \
  -O models/nexusraven-v2-13b.Q8_0.gguf

wget https://huggingface.co/krasserm/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf?download=true \
  -O models/Meta-Llama-3-8B-Instruct-Q8_0.gguf
docker run --gpus all --rm -p 8081:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/mistral-7b-instruct-v0.2.Q8_0.gguf -c 1024 --n-gpu-layers 33 --host 0.0.0.0 --port 8080

docker run --gpus all --rm -p 8084:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/Meta-Llama-3-8B-Instruct-Q8_0.gguf -c 2048 --n-gpu-layers 33 --host 0.0.0.0 --port 8080

docker run --gpus all --rm -p 8089:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/nexusraven-v2-13b.Q8_0.gguf --n-gpu-layers 41 --host 0.0.0.0 --port 8080
mkdir -p models

wget https://huggingface.co/krasserm/gba-planner-7B-v0.1-GGUF/resolve/main/gba-planner-7B-v0.1-Q8_0.gguf?download=true \
  -O models/gba-planner-7B-v0.1-Q8_0.gguf

wget https://huggingface.co/TheBloke/NexusRaven-V2-13B-GGUF/resolve/main/nexusraven-v2-13b.Q8_0.gguf?download=true \
  -O models/nexusraven-v2-13b.Q8_0.gguf

wget https://huggingface.co/krasserm/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf?download=true \
  -O models/Meta-Llama-3-8B-Instruct-Q8_0.gguf
docker run --gpus all --rm -p 8082:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/gba-planner-7B-v0.1-Q8_0.gguf -c 1024 --n-gpu-layers 33 --host 0.0.0.0 --port 8080

docker run --gpus all --rm -p 8084:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/Meta-Llama-3-8B-Instruct-Q8_0.gguf -c 2048 --n-gpu-layers 33 --host 0.0.0.0 --port 8080

docker run --gpus all --rm -p 8089:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-858f6b7 \
  -m /models/nexusraven-v2-13b.Q8_0.gguf --n-gpu-layers 41 --host 0.0.0.0 --port 8080

JSON mode example

mkdir -p models

wget https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGUF/resolve/main/llama-2-70b-chat.Q4_0.gguf?download=true \
  -O models/llama-2-70b-chat.Q4_0.gguf

wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q8_0.gguf?download=true \
  -O models/mistral-7b-instruct-v0.1.Q8_0.gguf
docker run --gpus all --rm -p 8080:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda-052051d8ae4639a1c3c61e7da3237bcc572469d4 \
  -m /models/llama-2-70b-chat.Q4_0.gguf --n-gpu-layers 83 --host 0.0.0.0 --port 8080

docker run --gpus all --rm -p 8081:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda-052051d8ae4639a1c3c61e7da3237bcc572469d4 \
  -m /models/mistral-7b-instruct-v0.1.Q8_0.gguf --n-gpu-layers 33 --host 0.0.0.0 --port 8080

Create environment

conda env create -f environment.yml
conda activate grammar-based-agents

If you additionally want to finetune the planner module:

conda env create -f environment-autotrain.yml
conda activate grammar-based-agents-autotrain

About

Modular open LLM agents via prompt chaining and schema-guided generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages