#

vision-language-transformer

Here are 15 public repositories matching this topic...

atharva-naik / MMML-TermProject-VizWiz-VQA-Challenge

VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

open-source opencv natural-language-processing computer-vision image-processing pytorch question-answering open-source-project carnegie-mellon-university term-project visual-question-answering vizwiz vision-language vision-language-transformer vizwiz-vqa

Updated Sep 13, 2023
Python

marialymperaiou / knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

knowledge-graph multi-task-learning visual-reasoning visual-dialog visual-question-answering vision-and-language multimodal-deep-learning visual-storytelling multimodal-retrieval visual-grounding visual-commonsense-reasoning vision-and-language-navigation story-visualization image-text-matching vision-language-transformer image-text-retrieval vision-and-language-pre-training conditional-image-generation knowledge-enhanced-multimodal-learning knowledge-enhanced-vision-language

Updated Dec 8, 2022

unitaryai / VTC

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated May 1, 2024
Python

sMamooler / CLIP_Explainability

code for studying OpenAI's CLIP explainability

machine-learning computer-vision gradcam-visualization model-explainability openai-clip vision-language-transformer

Updated Jan 7, 2022
Jupyter Notebook

yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

sdc17 / CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

framework transformer image-captioning visual-reasoning multimodal-learning visual-question-answering model-acceleration efficient-deep-learning vision-language-transformer image-text-retrieval text-image-retrieval token-ensemble token-matching

Updated Oct 4, 2023

lhao499 / instructrl

Instruction Following Agents with Multimodal Transforemrs

machine-learning reinforcement-learning instructions transformer flax jax instruction-following vision-language-transformer

Updated Nov 3, 2022
Python

sdc17 / UPop

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

framework sparsity image-captioning pruning structured model-compression visual-reasoning multimodal-learning visual-question-answering weight-pruning efficient-deep-learning vision-transformer vision-language-transformer image-text-retrieval text-image-retrieval

Updated Nov 4, 2023
Python

henghuiding / Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

tensorflow keras transformer vision-language referring-segmentation tpami iccv2021 vision-language-transformer

Updated Jan 7, 2022
Python

shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

open-world object-detection image-segmentation referring-expression-comprehension vision-language-transformer

Updated May 8, 2024
Python

henghuiding / ReLA

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

multimodal-learning referring-image-segmentation referring-expression-segmentation referring-expression-comprehension vision-language-transformer cvpr2023

Updated Sep 5, 2023
Python

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Apr 23, 2024
C++

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

image-captioning visual-reasoning visual-question-answering vision-language vision-language-transformer image-text-retrieval vision-and-language-pre-training

Updated May 20, 2024
Jupyter Notebook

IDEA-Research / GroundingDINO

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

open-world object-detection vision-language vision-language-transformer open-world-detection

Updated May 23, 2024
Python

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Jun 3, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."