Multimodal-AI-Chat-Application

Features

Quantized Model Integration: This app uses what are called "quantized models." These are special because they are designed to work well on regular consumer hardware, like the kind most of us have at home or in our offices. Normally, the original versions of these models are really big and need more powerful computers to run them. But quantized models are optimized to be smaller and more efficient, without losing much performance. This means you can use this app and its features without needing a super powerful computer. Quantized Models from TheBloke
Audio Chatting with Whisper AI: Leveraging Whisper AI's robust transcription capabilities, this app offers a sophisticated audio messaging experience. The integration of Whisper AI allows for accurate interpretation and response to voice inputs, enhancing the natural flow of conversations. Whisper models
Image Chatting with LLaVA: The app integrates LLaVA for image processing, which is essentially a fine-tuned LLaMA model equipped to understand image embeddings. These embeddings are generated using a CLIP model, making LLaVA function like a pipeline that brings together advanced text and image understanding. With LLaVA, the chat experience becomes more interactive and engaging, especially when it comes to handling and conversing about visual content. llama-cpp-python repo for Llava loading
PDF Chatting with Chroma DB: The app is tailored for both professional and academic uses, integrating Chroma DB as a vector database for efficient PDF interactions. This feature allows users to engage with their own PDF files locally on their device. Whether it's for reviewing business reports, academic papers, or any other PDF document, the app offers a seamless experience. It provides an effective way for users to interact with their PDFs, leveraging the power of AI to understand and respond to content within these documents. This makes it a valuable tool for personal use, where one can extract insights, summaries, and engage in a unique form of dialogue with the text in their PDF files. Chroma website

Getting Started

To get started with Local Multimodal AI Chat, clone the repository and follow these simple steps:

Create a Virtual Environment: I am using Python 3.10.12 currently
Upgrade pip: pip install --upgrade pip
Install Requirements: pip install -r requirements.txt

Windows Users: The installation might differ a bit for you, if you encounter errors you can't solve, please open an Issue here on github.
Setting Up Local Models: Download the models you want to implement. Here is the llava model I used for image chat (ggml-model-q5_k.gguf and mmproj-model-f16.gguf). And the quantized mistral model form TheBloke (mistral-7b-instruct-v0.1.Q5_K_M.gguf).
Customize config file: Check the config file and change accordingly to the models you downloaded.
Optional - Change Profile Pictures: Place your user_image.pnd and/or bot_image.png inside the chat_icons folder.
Enter commands in terminal:
1. python3 database_operations.py This will initialize the sqlite database for the chat sessions.
2. streamlit run app.py

Possible Improvements

~~Add Model Caching.~~
~~Add Images and Audio to Chat History Saving and Loading.~~
~~Use a Database to Save the Chat History.~~
Integrate Ollama, OpenAI, Gemini, or Other Model Providers.
Add Image Generator Model.
Authentication Mechanism.
Change Theme.
Separate Frontend and Backend Code for Better Deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
chat_icons		chat_icons
chat_sessions		chat_sessions
models		models
pdfs		pdfs
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
audio_handler.py		audio_handler.py
config.yaml		config.yaml
database_operations.py		database_operations.py
html_templates.py		html_templates.py
image_handler.py		image_handler.py
llm_chains.py		llm_chains.py
pdf_handler.py		pdf_handler.py
pip_freeze.txt		pip_freeze.txt
prompt_templates.py		prompt_templates.py
requirements.txt		requirements.txt
test.py		test.py
utils.py		utils.py

License

Hehua-Fan/Multimodal-AI-Chat-Application

Folders and files

Latest commit

History

Repository files navigation

Multimodal-AI-Chat-Application

Features

Getting Started

Possible Improvements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages