Open LLM WebUI

This repository contains a web application designed to execute relatively compact, locally-operated Large Language Models (LLMs).

Installation

Please follow these steps to install the software:

Create a new conda environment:

conda create -n ollm python=3.10
conda activate ollm

Clone the software repository:

git clone https://github.com/Uminosachi/open-llm-webui.git
cd open-llm-webui

Python Package Installation

General Instructions

Install the necessary Python packages by executing:
```
pip install -r requirements.txt
```

Platform-Specific Instructions

For Windows (with CUDA support):

Install Visual Studio:
- ⚠️ Important: Make sure to select Desktop development with C++ during the installation process.

Copy MSBuild extensions for CUDA as an administrator (adjust the CUDA version as necessary):

xcopy /e "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\extras\visual_studio_integration\MSBuildExtensions" "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations"

Configure the required environment variables for the build (adjust the CUDA version as necessary):

set PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;%PATH%
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
set FORCE_CMAKE=1 && set CMAKE_ARGS="-DLLAMA_CUDA=on"

Install the necessary Python packages (this process may take some time):

pip install ninja cmake scikit-build-core[pyproject]
pip install --force-reinstall --no-cache-dir llama-cpp-python
pip install -r requirements.txt

For Linux (with CUDA support):

Configure the required environment variables for the build (if not already set):
```
export PATH=/usr/local/cuda/bin:${PATH}
```

Install the necessary Python packages:

CMAKE_ARGS="-DLLAMA_CUDA=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
pip install -r requirements.txt

For Mac OS (without CUDA support):
- Install the necessary Python packages:
```
BUILD_CUDA_EXT=0 pip install -r requirements.txt
```

Running the application

python ollm_app.py

Open http://127.0.0.1:7860/ in your browser.

Downloading the Model

To download the model:

Launch this application.
Click on the "Download model" button next to the LLM model ID.
Wait for the download to complete.

Model List (transformers)

Provider	Model Names
Microsoft	Phi-3-mini-4k-instruct, Phi-3-mini-128k-instruct
Google	gemma-1.1-2b-it, gemma-1.1-7b-it
NVIDIA	Llama3-ChatQA-1.5-8B
Apple	OpenELM-1_1B-Instruct, OpenELM-3B-Instruct
Rakuten	RakutenAI-7B-chat, RakutenAI-7B-instruct
rinna	youri-7b-chat, bilingual-gpt-neox-4b-instruction-sft, japanese-gpt-neox-3.6b-instruction-sft-v2
TheBloke	Llama-2-7b-Chat-GPTQ, Kunoichi-7B-GPTQ
Stability AI	stablelm-tuned-alpha-3b, stablelm-tuned-alpha-7b, japanese-stablelm-instruct-beta-7b

🔍 Note: The downloaded model file will be stored in the .cache/huggingface/hub directory of your home directory.
Please check the license in the Model Credit section below.

Access and Download Models

Ensure you have obtained the necessary access rights via Hugging Face before downloading any models. Visit the following pages to obtain access:
- Gemma model by Google, Llama 2 model by Meta, Llama 3 model by Meta

Login to Hugging Face

Before downloading any models, please log in via the command line using:
```
huggingface-cli login
```

Model List (llama.cpp)

Provider	Model Names
Microsoft	Phi-3-mini-4k-instruct-q4.gguf, Phi-3-mini-4k-instruct-fp16.gguf
TheBloke	llama-2-7b-chat.Q4_K_M.gguf
QuantFactory	Meta-Llama-3-8B-Instruct.Q4_K_M.gguf

Using any GGUF file

🔍 File Placement: Place files with the .gguf extension in the models directory within the open-llm-webui folder. These files will then appear in the model list on the llama.cpp tab of the web UI and can be used accordingly.
📝 Metadata Usage: If the metadata of a GGUF model includes tokenizer.chat_template, this template will be used to create the prompts.

Usage

Enter your message into the "Input text" box. Adjust the slider for "Max new tokens" as needed.
Under "Advanced options", adjust the settings for "Temperature", "Top k", "Top p", and "Repetition Penalty" as needed.
Press "Enter" on your keyboard or click the "Generate" button.
- ⚠️ Note: If the cloud-based model has been updated, it may be downloaded upon execution.
If you click the "Clear text" button, the chat history will be cleared.

transformers tab

By enabling the CPU execution checkbox, the model will use the argument device_map="cpu".

llama.cpp tab

Use the radio buttons in the Default chat template to select the template that will be used if the GGUF model lacks a chat_template.

options

When you enable the Translate (ja->en/en->ja) checkbox:
- Any input in Japanese will be automatically translated to English, and responses in English will be automatically translated back into Japanese.
- ⚠️ Note: Downloading the translation model for the first time may take some time.

Model Credit

Developer	Model	License
Microsoft	Phi-3	The MIT License
Google	Gemma	Gemma Terms of Use
NVIDIA	Llama3-ChatQA	Llama 3 Community License
Apple	OpenELM	Apple sample code license
Rakuten	RakutenAI	Apache License 2.0
rinna	Youri	Llama 2 Community License
rinna	Japanese GPT-NeoX	The MIT License
Meta AI	Llama 2	Llama 2 Community License
Sanji Watsuki	Kunoichi-7B	CC-BY-NC-4.0
Stability AI	StableLM	Apache License 2.0
Stability AI	Japanese-StableLM-Instruct	Japanese Stablelm Research License Agreement

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cache_manager.py		cache_manager.py
chat_utils.py		chat_utils.py
custom_logging.py		custom_logging.py
model_manager.py		model_manager.py
model_manager_cpp.py		model_manager_cpp.py
ollm_app.py		ollm_app.py
registry.py		registry.py
requirements.txt		requirements.txt
start_messages.py		start_messages.py
translator.py		translator.py

License

Uminosachi/open-llm-webui

Folders and files

Latest commit

History

Repository files navigation