✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
-
Updated
May 10, 2024
✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Research Trends in LLM-guided Multimodal Learning.
A collection of resources on applications of multi-modal learning in medical imaging.
A Gradio demo of MGIE
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Curated papers on Large Language Models in Healthcare and Medical domain
From scratch implementation of a vision language model in pure PyTorch
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."