Skip to content

Langchain RAG model, with output streaming on Streamlit and using persistent VectorStore in disk

Notifications You must be signed in to change notification settings

rauni-iitr/RAG-Langchain-ChromaDB-OpenSourceLLM-Streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This project runs a local llm agent based RAG model on langchain using LCEL(LangChain Expression Language) as well as older LLM chains(RetrievalQA), see rag.py.
We are using LECL in rag.py for inference as it has a smooth output streaming generator output which is consumed by streamlit using 'write_stream' method.

The model uses persistent ChromaDB for vector store, which takes all the pdf files in data_source directory (one pdf about titanic for demo).

The UI is built on streamlit, where the output of RAG model is streamed token on the streamlit app in a chat format, see st_app.py.

image info

Note: The output can be streamed on terminal as well using calbacks.

LCEL - LangChain Expression Language:

Langchain's LCEL composes chain of components in linux pip system like:
chain = retriever | prompt | llm | Outputparser
See implementation in rag.py

image info

For more: Pinecone LCEL Article

Enviornment Setup

  1. Clone the repo using git:

    git clone https://github.com/rauni-iitr/langchain_chromaDB_opensourceLLM_streamlit.git
  2. Create a virtual enviornment, with 'venv' or with 'conda' and activate.

    python3 -m venv .venv
    source .venv/bin/activate
  3. Now this rag application is built using few dependencies:

    • pypdf -- for reading pdf documents
    • chromadb -- vectorDB for creating a vector store
    • transformers -- dependency for sentence-transfors, atleast in this repository
    • sentence-transformers -- for embedding models to convert pdf documnts into vectors
    • streamlit -- to make UI for the LLM PDF's Q&A
    • llama-cpp_python -- to load gguf files for CPU inference of LLMs
    • langchain -- framework to orchestrate VectorDB and LLM agent

    You can install all of these with pip;

    pip install pypdf chromadb langchain transformers sentence-transformers streamlit
  4. Installing llama-cpp-python:

    • This project uses uses LlamaCpp-Python for GGUF(llama-cpp-python >=0.1.83) models loading and inference, if you are using GGML models you need (llama-cpp-python <=0.1.76).

    If you are going to use BLAS or Metal with llama-cpp for faster inference then appropriate flags need to be setup:

    For Nvidia's GPU infernece, use 'cuBLAS', run below commands in your terminal:

    CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

    For Apple's Metal(M1/M2) based infernece, use 'METAL', run:

    CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

    For more info, for setting right flags on any device where your app is running, see here.

  5. Downloading GGUF/GGML models, need to be downloaded and path given to code in 'rag.py':

    • To run the model with open source LLMs saved locally, download model.

    • You can download any gguf file here based on your RAM specifications, you can find 2, 3, 4 and 8 bit quantized models for Mistral-7B-v0.1 developed by MistralAI here.

      Note: You can download any other model like llama-2, other versions of mistral or any other model with gguf and ggml format to be run through llama-cpp. If you have access to GPU, you can use GPTQ models(for better llm performance) as well which can be loaded with other libraries as well like transformers.

Your setup to run the llm app is ready.

To run the model:

streamlit run st_app.py

About

Langchain RAG model, with output streaming on Streamlit and using persistent VectorStore in disk

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages