This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
-
Updated
May 31, 2024 - Python
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Phi-3-Vision model test - running locally
[arXiv 23] Pytorch code for "Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval"
[EMNLP 2022] Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"
Official Code for the ACL 2024 (Findings) paper - ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
A curated list of awesome Multimodal studies.
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
Tiny and simple implementation of multimodal models
An open-source framework for training large multimodal models.
A Comparative Framework for Multimodal Recommender Systems
Research Code for Multimodal-Cognition Team in Ant Group
Deep Multimodal Guidance for Medical Image Classification: https://arxiv.org/pdf/2203.05683.pdf
This is a repository for CS4ML. It is a general framework for active learning in regression problems. It approximates a target function arising from general types of data, rather than pointwise samples.
ABAW6 (CVPR-W) We achieved second place in the valence arousal challenge of ABAW6
Corpus of resources for multimodal machine learning with physiological signals
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Add a description, image, and links to the multimodal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-learning topic, visit your repo's landing page and select "manage topics."