Skip to content

SciPhi-AI/R2R

Repository files navigation

Docs Discord Github Stars Commits-per-week License: MIT

Sciphi Framework

Build, deploy, and optimize your RAG system.

About

R2R (RAG to Riches) offers a fast and efficient framework for serving high-quality Retrieval-Augmented Generation (RAG) to end users. The framework is designed with customizable pipelines and a feature-rich FastAPI implementation, enabling developers to quickly deploy and scale RAG-based applications.

Why?

R2R was conceived to bridge the gap between local LLM experimentation and scalable production solutions. R2R is to LangChain/LlamaIndex what NextJS is to React. A JavaScript client for R2R deployments can be found here.

Key Features

  • πŸš€ Deploy: Instantly launch production-ready RAG pipelines with streaming capabilities.
  • 🧩 Customize: Tailor your pipeline with intuitive configuration files.
  • πŸ”Œ Extend: Enhance your pipeline with custom code integrations.
  • βš–οΈ Autoscale: Scale your pipeline effortlessly in the cloud using SciPhi.
  • πŸ€– OSS: Benefit from a framework developed by the open-source community, designed to simplify RAG deployment.

Table of Contents

  1. Demo(s)
  2. Links
  3. Quick Install
  4. Docker
  5. Q&A Example
  6. HyDE Example
  7. Running Local RAG
  8. Core Abstractions

Demo(s)

Using the cloud application to deploy the pre-built basic pipeline:

https://www.loom.com/share/e3b934b554484787b005702ced650ac9

Note - the example above uses SciPhi Cloud to pair with the R2R framework for deployment and observability. SciPhi is working to launch a self-hosted version of their cloud platform as R2R matures.

Links

Join the Discord server

R2R Docs Quickstart

SciPhi Cloud Docs

Local RAG Tutorial

Quick Install:

# use the `'r2r[all]'` to download all required deps
pip install 'r2r[eval]'

# setup env 
export OPENAI_API_KEY=sk-...
# Set `LOCAL_DB_PATH` for local testing
export LOCAL_DB_PATH=local.sqlite # robust providers available (e.g. qdrant, pgvector, ..)

# OR do `vim .env.example && cp .env.example .env`
# INCLUDE secrets and modify config.json
# if using cloud providers (e.g. pgvector, qdrant, ...)

Docker:

docker pull emrgntcmplxty/r2r:latest

# Choose from CONFIG_OPTION in {`default`, `local_ollama`}
# For cloud deployment, select `default` and pass `--env-file .env`
# For local deployment, select `local_ollama`
docker run -d --name r2r_container -p 8000:8000 -e CONFIG_OPTION=local_ollama  emrgntcmplxty/r2r:latest

Q&A Example

Configurable Pipeline: Execute this script to select and serve a Q&A RAG, Web RAG, or Agent RAG pipeline. This starter pipeline supports ingestion, embedding, and question and the specified RAG, all accessible via a REST API.

# launch the server
# For ex., do `export CONFIG_OPTION=local_ollama` or `--config=local_ollama` to run fully locally
# For ex., do `export PIPELINE_OPTION=web` or `--pipeline=web` to run WebRAG pipeline
python -m r2r.examples.servers.configurable_pipeline --config=default --pipeline=qna

Question & Answer Client: This client script should be executed subsequent to the server startup above with pipeline=qna specified. It facilitates the upload of text entries and PDFs to the server using the Python client and demonstrates the management of document and user-level vectors through its built-in features.

# run the client

# ingest the default documents (Lyft 10k)
python -m r2r.examples.clients.qna_rag_client ingest

python -m r2r.examples.clients.qna_rag_client search --query="What was lyfts profit in 2020?"

# Result 1: Title: Lyft 10k 2021
# Net loss was $1.0 billion, a decreas e of 42% and 61% compared to 2020 and 2019, respectively.
# Adjusted EBITDA was $92.9 million, marking the Company s first annual Adjusted EBITDA profit.
# Cash used in operating activi ties was $101.7 million.
# Unrestricted cash and cash equivalents and short-term investments totaled $2.3 billion as of December 31, 2021.Impact of COVID-19 to our Business
# The


# Result 2: Title: Lyft 10k 2021
# Total revenue was $3.2 billion, an increase of 36% year-over-year.
# Total costs and expenses were $4.3 billion, including stock-based compensation expense of $724.6 million and insurance costs related to changes to 
# le to historical periods of $250.3 million.
# Loss from operations was $1.1 billion. 
# Other income was $135.9 million, in cluding a pre-tax gain of $119.3 million as a result of the gain on the transaction with Woven Planet.

# ... 

python -m r2r.examples.clients.qna_rag_client rag_completion_streaming --query="What was lyfts profit in 2020?"

# <search>[{"id": "a0f6b427-9083-5ef2-aaa1-024b6cebbaee", "score": 0.6862949051074227, "metadata": {"user_id": "df7021ed-6e66-5581-bd69-d4e9ac1e5ada", "pipeline_run_id": "0c2c9a81-0720-4e34-8736-b66189956013", "text": "Title: Lyft 10k 2021\nNet loss was $ ... </search>
#
# <context> Title: Lyft 10k 2021 ... </context>
#
# <completion>Lyft's net loss in 2020 was $1.8 billion.</completion>

HyDE Example

HyDE Pipeline: Execute this script to start a backend server equipped with more advanced synthetic query pipeline. This pipeline is designed to create synthetic queries, enhancing the RAG system's learning and performance.

# launch the server
python -m r2r.examples.servers.configurable_pipeline --config=default --pipeline=hyde
# ingests Lyft 10K, Uber 10K, and others
python -m r2r.examples.clients.qna_rag_client ingest --document_filter=all

# run the client
python -m r2r.examples.clients.qna_rag_client search --query="What was lyft and ubers profit in 2020?"

# {... 'message': {'content': 'In 2020, Lyft reported a net loss of $1.7529 billion [8]. Uber also reported a significant loss for the year 2020, with its net loss improving by $1.8 billion from 2020, indicating a substantial loss for the year as well [38]. Neither company achieved a profit in 2020; instead, they both experienced considerable losses.' ...}

Running Local RAG

Refer here for a tutorial on how to modify the commands above to use local providers.

Core Abstractions

The framework primarily revolves around three core abstractions:

  • The Ingestion Pipeline: Facilitates the preparation of embeddable 'Documents' from various data formats (json, txt, pdf, html, etc.). The abstraction can be found in ingestion.py and relevant documentation is available here.

  • The Embedding Pipeline: Manages the transformation of text into stored vector embeddings, interacting with embedding and vector database providers through a series of steps (e.g., extract_text, transform_text, chunk_text, embed_chunks, etc.). The abstraction can be found in embedding.py and relevant documentation is available here.

  • The RAG Pipeline: Works similarly to the embedding pipeline but incorporates an LLM provider to produce text completions. The abstraction can be found in rag.py and relevant documentation is available here.

  • The Eval Pipeline: Samples some subset of rag_completion calls for evaluation. Currently DeepEval and Parea are supported. The abstraction can be found in eval.py and relevant documentation is available here.

Each pipeline incorporates a logging database for operation tracking and observability.