#
large-multimodal-models
Here are
19 public repositories
matching this topic...
MixEval, a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably updated every month to avoid contamination.
Updated
May 31, 2024
Python
Code for "Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning"
The offical Implementation of "Instruction-Guided Visual Masking"
Updated
May 31, 2024
Jupyter Notebook
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Updated
May 19, 2024
Python
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
Updated
Apr 16, 2024
Python
A curated list of awesome Multimodal studies.
Updated
May 27, 2024
HTML
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Updated
May 31, 2024
Python
An open-source implementation of LLaVA-NeXT.
Updated
May 30, 2024
Python
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Updated
Dec 25, 2023
Python
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Updated
Apr 17, 2024
Python
Open Platform for Embodied Agents
Updated
May 27, 2024
Python
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Updated
Mar 27, 2024
Python
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Updated
May 22, 2024
Python
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Updated
May 31, 2024
Python
A collection of resources on applications of multi-modal learning in medical imaging.
A Framework of Small-scale Large Multimodal Models
Updated
May 30, 2024
Python
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Updated
Feb 1, 2024
Python
AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
Updated
May 30, 2024
Python
Improve this page
Add a description, image, and links to the
large-multimodal-models
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
large-multimodal-models
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.