US-Elections-GPT:

Analyzing and Predicting US Election Trends with RAG-enabled GPT

This project leverages the power of Generative Pre-trained Transformers (GPT) in understanding, analyzing, and predicting trends in US elections. It does this by sourcing news articles from websites such as Fox News, CNN, NPR, and Politico ensuring that whether you are liberal or conservative, your political views can be represented in the answers generated by the LLM.

Architecture Design

A simplified architecture diagram of the product is shown below:

SERVICES

SCRAPER SERVICE & MONGODB

First, news articles are scraped from news sources and loaded into MongoDB. To ensure that all sides of the political spectrum are represented in the LLM, both right-wing and left-wing news sources, as well as those in the middle ground, are scraped. Users can select which side of the political spectrum they prefer when querying the LLM..

AIRFLOW, NEWS SUMMARY, & NAMED ENTITY EXTRACTION

Airflow is used with Papermill and Jupyter Notebook to summarize news articles using HuggingFace summarization pipeline, extract named entities using spaCy, and obtain vector embeddings using HuggingFace SentenceTransformer. The news articles, their summaries, and extracted entities including PERSON, ORG, and LOCATION are loaded into ChromaDB datastore.

REDIS CACHE

The news article URLs are cached in Redis to ensure that each webpage is visited and scraped only once

CHAT SERVICE & LANGCHAIN

The chat service is created using LangChain and HuggingFace pipeline. First, a Mistral 7B model (which may be substituted with a different model in the future) is fine-tuned into a 4-bit float model using bitsandbytes. This is done to reduce model footprint and speed up inference. LangChain is used to create a RAG-based model using the quantized model and ChromaDB, where the news article vector embeddings are stored. Redis is used to store chat sessions so that sessions can be revisited with ease.

FASTAPI & WEBSOCKET

A WebSocket server is created using FastAPI to serve the langchain model.

MODEL OPTIMIZATIONS

To ensure that accurate retrieval is done during model chat querying, some LLM optimizations were performed. These optimizations include small-to-big retrievals, reranking, and metadata search using extracted named entity recognition

TO RUN LOCALLY

Rename .env.example to .env and fill in the variables. Run docker compose build to build the images and then run docker compose up to start the services. To run on kubernetes, check out the deployment.yaml file

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
.idea		.idea
airflow		airflow
api		api
config		config
daos		daos
dbservices		dbservices
deployment		deployment
docs/img		docs/img
frontend		frontend
notebooks		notebooks
scraper		scraper
util		util
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.example		Dockerfile.example
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
logparser_settings.py		logparser_settings.py
requirements.all.txt		requirements.all.txt
scrapy.cfg		scrapy.cfg
scrapyd.conf		scrapyd.conf
scrapydweb_settings_v10.py		scrapydweb_settings_v10.py

License

deramos/USElections-GPT

Folders and files

Latest commit

History

Repository files navigation

US-Elections-GPT:

Analyzing and Predicting US Election Trends with RAG-enabled GPT

Architecture Design

SERVICES

SCRAPER SERVICE & MONGODB

AIRFLOW, NEWS SUMMARY, & NAMED ENTITY EXTRACTION

REDIS CACHE

CHAT SERVICE & LANGCHAIN

FASTAPI & WEBSOCKET

MODEL OPTIMIZATIONS

TO RUN LOCALLY

About

Topics

Resources

License

Stars

Watchers

Forks

Languages