LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
May 19, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
Recognize Any Regions
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Official repository for "CLIP model is an Efficient Continual Learner".
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
Demographic Bias of Vision-Language Foundation Models in Medical Imaging
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix (ICML 2022)
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."