Reading list for Multimodal Large Language Models
-
Updated
Aug 17, 2023
Reading list for Multimodal Large Language Models
Research Trends in LLM-guided Multimodal Learning.
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
A Gradio demo of MGIE
A Video Chat Agent with Temporal Prior
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
An Easy-to-use Hallucination Detection Framework for LLMs.
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."