Skip to content

kamil271e/ars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

article-retrieval-system

Efficient RAG retrieval system for article fragments from the Kaggle dataset available here. This system supports popular vector stores retrieval and includes a Question Answering functionality with Large Language Models (LLMs). By default, it utilizes FAISS for vector store retrieval and the Mixtral-8x7B LLM. More details in report.

Prerequisites

Data loading

To download the dataset, you can choose one of two options:

  1. Download it manually from link and create folder named data in project root directory. Then store the medium.csv file in that folder.
  2. Use the Kaggle API: Download your account token from this link and overwrite the existing kaggle.json file.

User Access Token

This step is not obligatory but necessary if you want to use Q/A system with Large Language Model support. To obtain your HuggingFaceHub API Token generate it and copy it from your HuggingFace account and paste it to .env file overwriting <YOUR_TOKEN> placeholder.

Running options

Local

pip install -r requirements.txt
chmod +x kaggle.sh
./kaggle.sh
streamlit run app.py

Docker

sudo docker build -t ars-app:latest .
sudo docker container run -it -p 8501:8501 ars-app:latest

Demo

ars_demo