The project aims to develop a conversational retrieval system using natural language processing techniques. It leverages the OpenAI GPT-3.5 Turbo model and various language processing libraries to create a chat-based interface for retrieving relevant information from a dataset.
To run the code successfully, please follow these steps:
Install the required dependencies by running pip install -r requirements.txt. Set up the necessary environment variables: OPENAI_API_KEY: OpenAI API key required for accessing the language model. ACTIVELOOP_TOKEN: Activeloop token for dataset management. ACTIVELOOP_USERNAME: Activeloop username for dataset management.
The code consists of the following main components:
The load_folder function loads documents from a specified folder and applies text splitting to segment them into smaller chunks. The resulting chunks are then passed to the upload_data function, which adds them to the dataset.
The embeddings module provides an interface to the OpenAI GPT-3.5 Turbo model, allowing for text embeddings generation. The DeepLake class serves as a vector store, enabling efficient storage and retrieval of text representations.
The ConversationalRetrievalChain class combines the language model and the vector store to create a chat-based retrieval system. It utilizes the ChatOpenAI model for generating responses based on user queries and the DeepLake vector store for retrieving relevant documents.
The code begins by setting up the required environment variables. It then attempts to load the dataset from an existing location. If the dataset does not exist, it loads documents from a specified folder and creates the dataset using the upload_data function.
Next, the code initializes the retrieval component by creating an instance of the DeepLake class and configuring the retrieval parameters.
Finally, the code sets up the conversational interface, allowing users to interact with the system by asking questions. The chat history is stored and updated with each interaction, and the responses are written to a file named TEST.md.
To use the code, follow these steps:
Set the necessary environment variables as described in the Setup section. Run the code and provide the path to the folder containing the documents when prompted. Interact with the system by asking questions in the command line. The responses will be stored in the TEST.md file.
This Readme provides an overview of the project, explaining its purpose, structure, and components. Following the provided instructions, you can set up and use the code effectively. Feel free to explore and customize the code according to your specific requirements.