Chat With PDFs

Chat with your PDF files for free, using Langchain, Groq, Chroma vector store, and Jina AI embeddings. This repository contains a simple Python implementation of the RAG (Retrieval-Augmented-Generation) system. The RAG model is used to retrieve relevant chunks of the user PDF file based on user queries and provide informative responses.

Installation

Follow these steps:

Clone the repository

git clone https://github.com/S4mpl3r/chat-with-pdf.git

Create a virtual environment and activate it (optional, but highly recommended).

python -m venv .venv
Windows: .venv\Scripts\activate
Linux: source .venv/bin/activate

Install required packages:

python -m pip install -r requirements.txt

Create a .env file in the root of the project and populate it with the following keys. You'll need to obtain your api keys:
```
JINA_API_KEY=<YOUR KEY>
GROQ_API_KEY=<YOUR KEY>
HF_TOKEN=<YOUR TOKEN>
HF_HOME=<PATH TO STORE HUGGINGFACE MODEL>
```
Run the program:
```
python main.py
```

Configuration

You can customize the behavior of the system by modifying the constants and parameters in the main.py file:

EMBED_MODEL_NAME: Specify the name of the Jina embedding model to be used.
LLM_NAME: Specify the name of the language model (Refer to Groq for the list of available models).
LLM_TEMPERATURE: Set the temperature parameter for the language model.
CHUNK_SIZE: Specify the maximum chunk size allowed by the embedding model.
DOCUMENT_DIR: Specify the directory where PDF documents are stored.
VECTOR_STORE_DIR: Specify the directory where vector embeddings are stored.
COLLECTION_NAME: Specify the name of the collection for the chroma vector store.

Resources

Kudos to the amazing libraries and services listed below:

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Chat With PDFs

Installation

Configuration

Resources

License

About

Languages

License

S4mpl3r/chat-with-pdf

Folders and files

Latest commit

History

Repository files navigation

Chat With PDFs

Installation

Configuration

Resources

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages