Andromeda-Chain - mastering all the chains

This repository contains both a server and a client package.

The server is (not very creatively) called guidance_server. The client is called AndromedaChain.

Why this package/library? The guidance is an awesome library, but has limited support through HTTP APIs like Oobabooga UI. So I rolled my own HTTP server, which allows me to quickly prototype apps that use guidance templates.

I originally created Oasis with a variant of this idea: https://github.com/paolorechia/oasis

Installation

AndromedaChain

pip install andromeda-chain

Guidance Server

Serving the guidance library with local models behind a HTTP server.

Supported methods:

Hugging Face (16bit, 8bit, 4bit)
GPTQ with or without CPU offload
Experimental LLaMA CPP support based on the work of https://github.com/Maximilian-Winter

The server configuration is passed through enviroment variables, typically through the docker-compose file:

    GENERAL_BASE_IMAGE: GPU
    # CPP Model Example:
    # GENERAL_MODEL_PATH: /models/open-llama-7B-open-instruct.ggmlv3.q4_0.bin
    # GENERAL_TOKENIZER_PATH: /models/VMware_open-llama-7b-open-instruct
    # GENERAL_LOADING_METHOD: CPP

    # GPTQ Model Example:
    GENERAL_MODEL_PATH: /models/vicuna-7B-1.1-GPTQ-4bit-128g
    GENERAL_LOADING_METHOD: GPTQ

    # HF Model Example
    # GENERAL_MODEL_PATH: /models/VMware_open-llama-7b-open-instruct
    # GENERAL_LOADING_METHOD: HUGGING_FACE

    # Guidance Settings
    GUIDANCE_AFTER_ROLE: "|>"
    GUIDANCE_BEFORE_ROLE: "<|"

    # Tokenizer Settings
    TK_BOOL_USE_FAST: false

    # HuggingFace
    HF_BOOL_USE_8_BIT: true
    HF_BOOL_USE_4_BIT: false
    HF_DEVICE_MAP: auto

    # GPTQ
    GPTQ_INT_WBITS: 4
    GPTQ_INT_GROUP_SIZE: 128
    GPTQ_INT_PRE_LOADED_LAYERS: 20
    GPTQ_DEVICE: "cuda"
    GPTQ_BOOL_CPU_OFFLOADING: false

    # LLaMA CPP
    CPP_INT_N_GPU_LAYERS: 300
    CPP_INT_N_THREADS: 12
    CPP_BOOL_CACHING: false

Docker Container

Requirements:

docker-engine
docker-compose v2

If using GPU also:

nvidia-docker: https://github.com/NVIDIA/nvidia-docker

Pulling the image

You can find the images tags in Docker Hub The easiest way to pull it is to reuse the docker-compose file.

docker-compose -f docker-compose.gpu.yaml up

Or use the CPU version

docker-compose -f docker-compose.cpu.yaml up

Note that you still need to setup the model (see in usage section).

LICENSE NOTE: The GPU image is based on nvidia/cuda:12.1.1-runtime-ubuntu22.04, which is subject to the proprietary NVIDIA license. While the software from Andromeda repository is open source, some layers of the docker container are not.

Building

Just use the appropriate bash script

./build_gpu.sh

Or:

./build_cpu.sh

Usage

Download a LLM model you want to use from Hugging Face.
Create a 'models' directory locally, and save the model in there.
Setup the environment variable MODEL_PATH in the docker-compose.gpu or docker-compose.cpu depending which one you want.
Start the server.
Use the Andromeda package to query the server.

Using Andromeda Package

from andromeda_chain import AndromedaChain, AndromedaPrompt, AndromedaResponse

chain = AndromedaChain()

prompt = AndromedaPrompt(
    name="hello",
    prompt_template="""Howdy: {{gen 'expert_names' temperature=0 max_tokens=300}}""",
    input_vars=[],
    output_vars=["expert_names"]
)

response: AndromedaResponse = chain.run_guidance_prompt(prompt)
# Use the response
print(response.expanded_generation)
print(response.result_vars)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
guidance_server		guidance_server
src/andromeda_chain		src/andromeda_chain
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
build_andromeda.sh		build_andromeda.sh
build_cpu.sh		build_cpu.sh
build_gpu.sh		build_gpu.sh
docker-compose.cpu.yaml		docker-compose.cpu.yaml
docker-compose.gpu.yaml		docker-compose.gpu.yaml
format.sh		format.sh
requirements.cpu.txt		requirements.cpu.txt
requirements.gpu.txt		requirements.gpu.txt
setup.py		setup.py
test_requirement.txt		test_requirement.txt
version.txt		version.txt

License

ChuloAI/andromeda-chain

Folders and files

Latest commit

History

Repository files navigation

Andromeda-Chain - mastering all the chains

Installation

AndromedaChain

Guidance Server

Docker Container

Pulling the image

Building

Usage

Using Andromeda Package

About

Resources

License

Stars

Watchers

Forks

Languages