About

This project runs a local llm agent based RAG model on langchain using LCEL(LangChain Expression Language) as well as older LLM chains(RetrievalQA), see rag.py.
We are using LECL in rag.py for inference as it has a smooth output streaming generator output which is consumed by streamlit using 'write_stream' method.

The model uses persistent ChromaDB for vector store, which takes all the pdf files in data_source directory (one pdf about titanic for demo).

The UI is built on streamlit, where the output of RAG model is streamed token on the streamlit app in a chat format, see st_app.py.

Note: The output can be streamed on terminal as well using calbacks.

LCEL - LangChain Expression Language:

Langchain's LCEL composes chain of components in linux pip system like:
chain = retriever | prompt | llm | Outputparser
See implementation in rag.py

For more: Pinecone LCEL Article

Enviornment Setup

Clone the repo using git:

git clone https://github.com/rauni-iitr/langchain_chromaDB_opensourceLLM_streamlit.git

Create a virtual enviornment, with 'venv' or with 'conda' and activate.
```
python3 -m venv .venv
source .venv/bin/activate
```
Now this rag application is built using few dependencies:
- pypdf -- for reading pdf documents
- chromadb -- vectorDB for creating a vector store
- transformers -- dependency for sentence-transfors, atleast in this repository
- sentence-transformers -- for embedding models to convert pdf documnts into vectors
- streamlit -- to make UI for the LLM PDF's Q&A
- llama-cpp_python -- to load gguf files for CPU inference of LLMs
- langchain -- framework to orchestrate VectorDB and LLM agent
You can install all of these with pip;
```
pip install pypdf chromadb langchain transformers sentence-transformers streamlit
```
Installing llama-cpp-python:
- This project uses uses LlamaCpp-Python for GGUF(llama-cpp-python >=0.1.83) models loading and inference, if you are using GGML models you need (llama-cpp-python <=0.1.76).
If you are going to use BLAS or Metal with llama-cpp for faster inference then appropriate flags need to be setup:

For Nvidia's GPU infernece, use 'cuBLAS', run below commands in your terminal:
```
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
```
For Apple's Metal(M1/M2) based infernece, use 'METAL', run:
```
CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
```
For more info, for setting right flags on any device where your app is running, see here.
Downloading GGUF/GGML models, need to be downloaded and path given to code in 'rag.py':
- To run the model with open source LLMs saved locally, download model.
- You can download any gguf file here based on your RAM specifications, you can find 2, 3, 4 and 8 bit quantized models for Mistral-7B-v0.1 developed by MistralAI here.
  
  Note: You can download any other model like llama-2, other versions of mistral or any other model with gguf and ggml format to be run through llama-cpp. If you have access to GPU, you can use GPTQ models(for better llm performance) as well which can be loaded with other libraries as well like transformers.

Your setup to run the llm app is ready.

To run the model:

streamlit run st_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data_source		data_source
.gitignore		.gitignore
rag.py		rag.py
readme.md		readme.md
requirements.txt		requirements.txt
st_app.py		st_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

data_source

data_source

.gitignore

.gitignore

rag.py

rag.py

readme.md

readme.md

requirements.txt

requirements.txt

st_app.py

st_app.py

Repository files navigation

About

LCEL - LangChain Expression Language:

Enviornment Setup

Your setup to run the llm app is ready.

About

Releases 1

Packages

Contributors 2

Languages

rauni-iitr/RAG-Langchain-ChromaDB-OpenSourceLLM-Streamlit

Folders and files

Latest commit

History

Repository files navigation

About

LCEL - LangChain Expression Language:

Enviornment Setup

Your setup to run the llm app is ready.

About

Topics

Resources

Stars

Watchers

Forks

Languages