#

large-multimodal-models

Here are 19 public repositories matching this topic...

Psycoy / MixEval

MixEval, a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably updated every month to avoid contamination.

benchmark machine-learning deep-learning dynamic evaluation benchmarking-suite evaluation-framework benchmarking-framework benchmark-datasets foundation-models large-language-models large-language-model large-multimodal-models large-multimodality-models benchmark-mixture

Updated May 31, 2024
Python

jameszhou-gl / icl-distribution-shift

Code for "Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning"

distribution-shift large-multimodal-models

Updated May 21, 2024

2toinf / IVM

The offical Implementation of "Instruction-Guided Visual Masking"

computer-vision deep-learning robotics multimodal pytorch-implementation instruction-following large-language-models instruction-tuning large-multimodal-models

Updated May 31, 2024
Jupyter Notebook

zchoi / Multi-Modal-Large-Language-Learning

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

benchmark awesome multimodal pre-training foundation-models large-language-models large-multimodal-models

Updated Mar 31, 2024

MileBench / MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

benchmark machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation multimodality visual-question-answering multimodal foundation-models large-language-models llm llms long-context-transformers multimodal-large-language-models large-multimodal-models long-context-modeling

Updated May 19, 2024
Python

bzluan / TextCoT

The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.

chain-of-thought large-multimodal-models

Updated Apr 16, 2024
Python

friedrichor / Awesome-Multimodal-Papers

A curated list of awesome Multimodal studies.

deep-learning multimodal-learning multimodal multimodal-deep-learning multimodal-data multimodal-dialogue multimodal-large-language-models large-multimodal-models

Updated May 27, 2024
HTML

VisualWebBench / VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

machine-learning natural-language-processing computer-vision deep-learning evaluation question-answering visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms mllm multimodal-large-language-models large-multimodal-models

Updated May 31, 2024
Python

xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

chatbot llama multimodal multi-modality gpt-4 visual-language-learning chatgpt vision-language-model llava large-multimodal-models llama3 gpt4o llava-next

Updated May 30, 2024
Python

AIFEG / BenchLMM

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

benchmark cv dataset large-language-models large-multimodal-models

Updated Dec 25, 2023
Python

MMStar-Benchmark / MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation multimodality multimodal-learning visual-question-answering multimodal large-language-models llm llms large-vision-language-model large-vision-language-models large-multimodal-models lvlms lvlm

Updated Apr 17, 2024
Python

thunlp / LEGENT

Open Platform for Embodied Agents

physics-engine robot-simulator language-grounding embodied-ai large-multimodal-models

Updated May 27, 2024
Python

sshh12 / multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

multimodal multi-modality large-language-models llm vision-language-model llava large-context large-multimodal-models

Updated Mar 27, 2024
Python

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated May 22, 2024
Python

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 31, 2024
Python

richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

medical-imaging multimodal-learning visual-question-answering multimodal-deep-learning large-language-models medical-report-generation multimodal-large-language-models large-multimodal-models

Updated May 5, 2024

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated May 30, 2024
Python

LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent tool-use large-language-models multimodal-large-language-models large-multimodal-models

Updated Feb 1, 2024
Python

OpenAdapt

OpenAdaptAI / OpenAdapt

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

python transformers openai agents process-mining ai-agents process-automation huggingface huggingface-transformers ultralytics gpt-4 large-language-models anthropic segment-anything ai-agents-framework large-multimodal-models gpt4-vision google-gemini large-action-model

Updated May 30, 2024
Python

Improve this page

Add a description, image, and links to the large-multimodal-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the large-multimodal-models topic, visit your repo's landing page and select "manage topics."