ai-llm-playground

Experiments with OpenAI Assistant API, Langchain, Embedding, and Agents

Table of content

Restaurant Advisor (OpenAI Assistant version)
Web Scraper based on Vision API
Chat with PDF documents using OpenAI Assistant API
Restaurant Advisor (Outdated Langchain + Redis version)
AI Girlfriend
Chat with Multiple Documents (Outdated Langchain version)
Setup

Restaurant Advisor (OpenAI Assistant version)

This is the continuation of the Restaurant Advisor (outdated Langchain + Redis version) project. I decided to get rid of Langchain and switch to the native OpenAI API. There are few reasons for this:

OpenAI API now supports agents out of the box (Assistants + Threads). This is basically all I need in my development
I prefer controllable low level solutions over magical boxes. It is hard to override standard LangChain Agent's behaviour (prompts, output parsers) when you face some limitations. I found it easier and more flexible to write my own custom code rather than using predefined retrieval tools.
I got tired of dealing with issues after updating GPT models and libraries

So, It's pure OpenAI API now.

I use an Assistant API with custom tools: vector semantic search with location pre-filtering (MongoDb Atlas) and image generation (DALL-E 3)
I don't need Redis for conversation history anymore because OpenAI Threads can do the same
I use the latest GPT-4 Turbo model
I use OpenAI voice generation

The core of this project is the Assistant with few tools. It is capable of doing the following:

keep the conversation with the user, suggest restaurants and dishes from the database
understand when to query the database and come up with queries to find the best restaurants nearby and use the result in the conversation
understand when a user wants to see a particular dish and generate an image of it using DALL-E 3
reliably reply in JSON format even though Assistant API doesn't support the JSON output format

Examples of the generated images during a conversation:

I'll add more details about how to create the database with indexes in MongoDb Atlas and how to deploy this to AWS later. I plan to create some architectural diagrams as well. Even thought there is not so much to architect here, but still. There are tools and some tricks with location pre-filtering which require some explanation to those who want to do the same.

Web Scraper based on Vision API

Credits go to gpt4v-browsing repo as a source of ~~copy-paste~~ inspiration. The idea is nice and simple: let's make a screenshot of a web page, then ask OpenAI Vision API to recognize stuff on it. And then answer to my questions regarding the page content. And maybe even navigate through pages and click buttons for collecting more data before answering. I always hated scraping autogenerated HTML pages, this is a super nice alternative. If it works, of course. To be honest, I'm not so sure about it. Let's find out.

Few problems I met:

Vision API is not so good at answering questions about the text page content. For some reason, the result is better if you first ask the Vision API to extract some specific text from the screenshot, then ask the Text Completion API to answer to your question using the text from the previous step.
Vision API refuses to recognize a lot of text on the page. So it is not possible to ask it to extract all the text. You have to be specific and ask to extract only some text. Ideally, the Vision API prompt should also be constructed out of the original question.
Vision API cannot extract all the related text. For example, when i ask to give me all the horror movies it sees on the page, it never gives all of them.

What I have in mind for this project so far:

Give a URL and a task to the agent. For example, some cinema site and a task to find a movie to watch tonight.
The agent should be able to take a screenshot of the root page, come up with the right vision prompt, extract some text and make a decision if it want to navigate further or not.
To make the navigation possible, the agent should be able to recognize links on the page.
It would be nice to intake into account an IMDB film rating before suggestion a movie to watch. Thus, this getting a rating though API call should be one of the agent's tool.
I'm going to use the Puppeteer js library to take screenshots and navigate through pages. This might be tricky to integrate a js library into Python code. I'll see how it goes.

More stuff coming later

Project Setup is in a separate web-scraper/README.md file.

Chat with PDF documents using OpenAI Assistant API

This is a better version of the Chat with Multiple Documents (Outdated Langchain version) because it uses native OpenAI API and Assistant API with the latest model. No need to parse PDFs manually and upload their text content into vector stores. It is all done on the OpenAI side

This agent wants to be convinced:

Restaurant Advisor (Outdated Langchain + Redis version)

This chatbot is aware of restaurant database in MongoDB and is capable of finding the best one nearby. It combines vector semantic search with geo-location MongoDb Atlas Index search. It keeps the chatbot conversation history in Redis. It is quite awesome, the most advanced AI project I did so far.

I chose MongoDb as a vector store because of multiple reasons:

I can to keep documents in a cloud, not only vectors
My documents are not just text chunks but complex JSON object with a schema
Each document has embedding and location fields that are indexed and can be used for fast semantic and geo-location search
I use geo-location search as a filter for the following vector search. I.e. I limit the search to the restaurants nearby and then I use vector search to find the best one.
I can use MongoDB native queries if I feel limitations with Langchain API (or in case of bugs which I encountered a few times)

I plan to deploy this to AWS Lambda eventually (I hope soon), thus I need to keep conversation history somewhere. I chose Redis. It is supported by Langchain. The application supports StreamLit and Flask servers.

To start it locally run a Redis container using the docker-compose.yml:

docker-compose up

Then start the Python application as usual (see below).

AI Girlfriend

Okay, this is not actually a girlfriend but more like an interesting person with some technical background. At first, I took some custom prompts for chatbots with AI-girlfriend personality from FlowGPT. But they all were either anime or virtual sex oriented (usually both) which I found rather boring. I came up with my own prompt that focuses on making the chatbot more alive and natural. I prohibited her to mention that she is an AI and gave some background in engineering so she is quite nerdy. I also tried to make her more open-minded that a regular Chat GPT, therefore she has some temper and can even insult you (she called me stupid once). She can talk using AI-generated voice which is very impressive.

In the end, this is a simple chatbot created with Langchain, Streamlit, and OpenAI API. It can voice-talk almost like a real human using Elevenlabs API.
I use the Elevenlabs API (which is free) to generate a voice in a browser (StreamLit allows to play it).

Chat with Multiple Documents (Outdated Langchain version)

Here I use vector database to store txt documents' content. Langchain with stuff chain type allows to query this store and use it in chatting with llm

Setup

Rename the .env.template file into .env and fill in the values.

Pipenv setup

I use pipenv to manage dependencies. Install it, create a virtual environment, activate it and install dependencies.

Install pipenv using official docs. For example, on Mac:
```
pip install pipenv --user
```
Add pipenv to PATH if it's not there. For example, I had to add to the ~/.zshrc file the following line:
```
export PATH="/Users/hiper2d/Library/Python/3.11/bin:$PATH"
```

Install packages and create a virtual environment for the project:

cd <project dir> # navigate to the project dir
pipenv install

This should create a virtual environment and install all dependencies from Pipfile.lock file.

If for any reason you need to create a virtual environment manually, use the following command:

pip install virtualenv # install virtualenv if you don't have it
virtualenv --version # check if it's installed
cd <virtualenv dir> # for example, my virtual envs as here: /Users/hiper2d/.local/share/virtualenvs
virtualenv <virtualenv name> # I usually use a project name

To swtich to the virtual environment, use the following command:
```
cd <project dir>
pipenv shell
```
If this fails, than do the following:
```
cd <virtualenv dir>/bin
source activate
```

Intellij Idea/PyCharm Run/Debug setup

Add a Python Interpreter. Idea will generate a virtual environment for you.
- Go to Project Settings > SDK > Add SDK > Python SDK > Pipenv Environment
- Add paths to python and pipenv like this:
Create a Python StreamLit Run/Debug configuration like this:
Create a Python Flask Run/Debug configuration (in dish-adviser only) like this:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dish-adviser		dish-adviser
girlfriend		girlfriend
images		images
multiple-doc-chat		multiple-doc-chat
pdf-summarizer		pdf-summarizer
restaurant-assistant		restaurant-assistant
web-scraper		web-scraper
.env.template		.env.template
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

hiper2d/ai-llm-playground

Folders and files

Latest commit

History

Repository files navigation

ai-llm-playground

Table of content

Restaurant Advisor (OpenAI Assistant version)

Web Scraper based on Vision API

Chat with PDF documents using OpenAI Assistant API

Restaurant Advisor (Outdated Langchain + Redis version)

AI Girlfriend

Chat with Multiple Documents (Outdated Langchain version)

Setup

Pipenv setup

Intellij Idea/PyCharm Run/Debug setup

About

Topics

Resources

Stars

Watchers

Forks

Languages