#

vision-language-pretraining

Here are 28 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated May 19, 2024
Jupyter Notebook

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

llama large-language-models video-language-pretraining vision-language-pretraining cross-modal-pretraining blip2 minigpt4 multi-modal-chatgpt

Updated May 24, 2024
Python

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

foundation-models vision-language-pretraining vision-language-model

Updated Apr 24, 2024
Python

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated May 20, 2024
Python

Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

multi-model clip big-model zero-shot self-supervised image-text vision-language-pretraining

Updated Sep 19, 2022
Python

TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

vision-language-pretraining audio-language-pretraining audiovisual-language-pretraining multimodal-representation-learning

Updated May 28, 2024
Python

ArrowLuo / SegCLIP

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

transfer-learning semantic-segmentation contrastive-learning zero-shot-semantic-segmentation vision-language-pretraining open-vocabulary open-vocabulary-semantic-segmentation

Updated Jun 28, 2023
Python

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

cross-modal multimodal-deep-learning multimodal-datasets transformer-models multimodal-pre-trained-model vision-language-pretraining multimodal-applications multimodal-pretext

Updated Oct 19, 2023

jusiro / FLAIR

FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

medical-imaging fundus-image-analysis foundation-models vision-language-pretraining

Updated May 15, 2024
Python

Surrey-UPLab / Recognize-Any-Regions

Recognize Any Regions

open-world object-detection zero-shot instance-segmentation auto-labeling vision-language-pretraining open-vocabulary vision-language-model multimodal-representation-learning vision-foundation-model vision-language-foundation-model

Updated Nov 22, 2023
Python

sail-sg / ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

cross-modality vlp vision-language-pretraining

Updated Jun 7, 2023
Python

omipan / svl_adapter

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

self-supervised-learning vision-language-pretraining

Updated Jan 11, 2024
Python

TencentARC / FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

language-modeling vision-language-pretraining

Updated May 15, 2023
Python

vgthengane / Continual-CLIP

Official repository for "CLIP model is an Efficient Continual Learner".

baseline clip continual-learning vision-language-pretraining foundational-models

Updated Dec 13, 2022
Python

Zoky-2020 / SGA

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

adversarial-attack vision-language-pretraining

Updated Sep 6, 2023
Python

YyzHarry / vlm-fairness

Demographic Bias of Vision-Language Foundation Models in Medical Imaging

medical-imaging fairness subpopulation algorithmic-fairness bias-mitigation ood-generalization foundation-models vision-language-pretraining vision-language-model

Updated Feb 23, 2024
Python

ttengwang / VLMixer

VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix (ICML 2022)

vision-language vision-language-pretraining

Updated Jun 16, 2022

LooperXX / ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Dec 12, 2023
Python

TXH-mercury / COSA

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

video-captioning video-qa video-retrieval vision-language-pretraining video-language-pretrainng

Updated Aug 1, 2023
Python

yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

Improve this page

Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."