[Feature Request] Multi-modal RAG(Retrieval-Augmented Generation) #445

FUYICC · 2024-01-30T08:38:57Z

I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

The current RAG can only retrieve text-type information, and cannot retrieve and extract information from images, audio, and other information.

For images:

For other modalities:
tbc

FUYICC added the enhancement New feature or request label Jan 30, 2024

FUYICC assigned lightaime and FUYICC and unassigned lightaime Jan 30, 2024

FUYICC linked a pull request Feb 1, 2024 that will close this issue

Integration of VLM embedding model #446

Open

Provide feedback