Retreaval Augmented Generation with LLMs and FAST API

Welcome to the Retrieval Augmented Generation (RAG) repository! This project empowers users to perform Question-Answering (QnA) tasks over their own documents using the state-of-the-art RAG technique. By combining open-sourced Large Language Models (LLMs), Langchain and FastAPI, we provide a powerful and user-friendly platform for handling document-based QnA.

RAG Pipeline source.

Table of Content

Getting Started
- Prerequisites
- Installation
Usage
Advanced Configuration
- Configure LLm Endpoint

Getting Started

In this section, we'll guide you through setting up and running RAG for your document-based QnA. Follow these steps to get started:

Prerequisites

Create a vertual python env in your local directory and activate it.

python3.9 -m venv llm_env/
source activate llm_env/bin/activate

Installation

Clone this repository to your local machine.

https://github.com/AshishSinha5/rag_api.git
cd rag_api

Install the required Python packages.

pip install -r requirements.txt

The project currntly uses plain C/C++ implementation of LLAMA2 model from this repository llama.cpp. The model can be downloaded from TheBloke's HuggingFace page.

wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf

Usage

We'll be using the SwaggerUI (that comes bundled with the FastAPI library) to interact with our API interface.

Starting the server

cd src/rag_app
uvicorn main:app

Opening SwaggerUI

In your favorite browser, go to the following link -

http://127.0.0.1:8000/docs

Upload Document

To upload our document we'll send a POST request. During the upload procedure the following parameters are required -

collection_name - Name of the vector db where you want to upload your document to. New db will be created if the it doesn;t exist already, or the document will be appended to the exisitng db.
file - File to be uploaded. Currently only pdf and html files are supported.

Uploading Documents to vector_db.

Query Document

To perform the QnA on our documents we'll hit the query/ endpoint of our API. We'll need the following parameters to perform our query -

query - The query string.
n_result - Number of most similar document chunks to load from our vector_db to create the relevant context for our query.
collection_name - Name of the vector_db we want to query.

Query Documents Using the LLM.

Advanced Configuration

Configure LLM Parameters

As we start our application the llama2.cpp LLM gets initialized with the default parameters. But we may wish to configure our LLM Model as per our liking. We can use the init_llm/ endpoint to configure the model. Currently following parameters are available to configure -

n_gpu_layers - Number of layers to load on the GPU.
n_ctx - Token context window.
n_batch - Number of tokens to process in parallel. Should be a number between 1 and n_ctx.
max_tokens - The maximum number of tokens to generate.
temperature - Temperature for sampling. Higher values means more random samples.

As llama.cpp model allows for configurable parameters, they may be added in future.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs/img		docs/img
src/rag_app		src/rag_app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/img

docs/img

src/rag_app

src/rag_app

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Retreaval Augmented Generation with LLMs and FAST API

Table of Content

Getting Started

Prerequisites

Installation

Usage

Starting the server

Opening SwaggerUI

Upload Document

Query Document

Advanced Configuration

Configure LLM Parameters

About

Releases

Packages

Languages

AshishSinha5/rag_api

Folders and files

Latest commit

History

Repository files navigation

Retreaval Augmented Generation with LLMs and FAST API

Table of Content

Getting Started

Prerequisites

Installation

Usage

Starting the server

Opening SwaggerUI

Upload Document

Query Document

Advanced Configuration

Configure LLM Parameters

About

Topics

Resources

Stars

Watchers

Forks

Languages