LLaMa Web

The LLaMa Web is a web interface to chat or play with LLaMa based models.

Installation

Requirements

NodeJS 18
Yarn
MongoDB (for saving chats)
llama.cpp
Keycloak server (for authentication / OPTIONAL)

Building

cd client && pnpm install --frozen-lockfile && pnpm build
cd ..
cd api && yarn install --frozen-lockfile && yarn build

Configuration

Copy the example.env file from the both folder to .env and edit it.

cp client/example.env client/.env && nano client/.env
cp api/example.env api/.env && nano api/.env

Running

In the both folder run the following command:

yarn start

Running with Docker

Requirements

Docker
Docker Compose
Download docker-compose.yml file

Configuration

Edit the docker-compose.yml file and change the environment variables.

However, you can't change the DB, LLAMA_PATH and LLAMA_EMBEDDING_PATH variables.

If you don't want to use Keycloak, you can enable the SKIP_AUTH variable, by setting it to true in client AND api.

Running

docker-compose up -d

Adding a model

Note

We assume you want to use TheBloke/Llama-2-7B-Chat-GGUF. Good to know: This project is tested by using TheBloke's GGUF model.

Note

On Docker or without docker, the steps are the same.

Go to the playground
Then go to the Models tab
Click on Install a new model
Enter the name of the model (e.g. llama-2-7b-chat)
Enter the download link (e.g. https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf)
Enter the model chat template (can be found here)
Click on Install the new model
Wait until the model is installed, you can refresh the page to see when it is done.

Use alternative compute backend

Warning

Built-in models management is not supported when using an alternative compute backend. You have to edit the alternative backend directly to support the model you want to use. No support will be provided for this. It is possible to use the built-in models management and the alternative compute backend at the same time.

Note

You can disable alternative compute backend by setting ALLOW_ALTERNATIVE_COMPUTE_BACKEND to false in the api .env file.

Requirements

a server that can run an app similar to examples/alt-backend/mixtral8x7B.py

Setup in LLaMa Web

Go to the playground
Then go to the Models tab
Click on Install a new model
Enter the name of the model (e.g. llama-2-7b-chat)
Press on Use alternative compute backend
Enter the compute backend url (e.g. https://my-alternative-compute-backend.domain.com)
Press on Add the alternative backend model

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.github		.github
api		api
client		client
examples/alt-backend		examples/alt-backend
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
PROMPTS.md		PROMPTS.md
README.md		README.md
docker-compose.yml		docker-compose.yml

License

ungarscool1/llama-web

Folders and files

Latest commit

History

Repository files navigation

LLaMa Web

Installation

Requirements

Building

Configuration

Running

Running with Docker

Requirements

Configuration

Running

Adding a model

Use alternative compute backend

Requirements

Setup in LLaMa Web

About

Topics

Resources

License

Stars

Watchers

Forks

Languages