LLaMa2 GPTQ

Chat AI which can provide responses with reference documents by Prompt engineering over vector database. It suggests related web pages provided through the integration with my previous product, Texonom.

Pursuing local, private and personal AI without requesting external API attained by optimizing inference performance with GPTQ model quantization. This project was inspired by the langchain projects like notion-qa, localGPT.

Demos

CLI Demo

cli.mp4

Chat Demo

chat.mp4

Install

This project is using rye as package manager Currently only available with CUDA

rye sync

or using pip

CUDA_VERSION=cu118
TORCH_VERSION=2.0.1
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION --force
pip install torch==$TORCH_VERSION --index-url https://download.pytorch.org/whl/$CUDA_VERSION
pip install .

QA

1. Chat with Web UI

streamlit run chat.py

2. Chat with CLI

python main.py chat

Ingest Documents

Currently code structure is mainly focussed on Notion's csv exported data

Custom source documents

# Put document files to ./knowledge folder
python main.py process
# Or use provided Texonom DB
git clone https://huggingface.co/datasets/texonom/md-chroma-instructor-xl db

Quantize Model

Default model is orca 3b for now

python main quantize --source_model facebook/opt-125m --output opt-125m-4bit-gptq --push

Future Plan

MPS support using dynamic model selecting
Stateful Web App support like chat-langchain

App Stack

LLM Stack

Langchain for Prompt Engineering
ChromaDB for storing embeddings
Transformers for LLM engine
AutoGPTQ for Quantization & Inference

Python Stack

Rye for package management
Mypy for type checking
Fire for CLI implementation
Streamlit for Web UI implementation

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
img		img
llama2gptq		llama2gptq
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py
constants.py		constants.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

License

seonglae/llama2gptq

Folders and files

Latest commit

History

Repository files navigation

LLaMa2 GPTQ

Demos

CLI Demo

Chat Demo

Install

QA

1. Chat with Web UI

2. Chat with CLI

Ingest Documents

Custom source documents

Quantize Model

Future Plan

App Stack

LLM Stack

Python Stack

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages