VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
-
Updated
May 23, 2024 - Python
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Self configuring and adapting vision transformer for segmentation of 3d images
repo for practicing DL/genAI
Pre-training a VisionTransformer with Masked Image Modelling for semantic segmentation
A curated list of foundation models for vision and language tasks
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Scenic: A Jax Library for Computer Vision Research and Beyond
The official repo for [IJCAI'24] "LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation"
Investigate possibilities for Vision Transformers with multiscale grids
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
This is a series of computer vision foundational projects for anyone diving into the field must tackle.
[ICLR 2024 Oral] Less is More: Fewer Interpretable Region via Submodular Subset Selection
The Brain Tumor MRI Dataset from Kaggle is employed for automated brain tumor detection and classification research. Investigated methods include using pre-trained models (VGG16, ResNet50, and ViT). 🧠🔍
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
Video Foundation Models & Data for Multimodal Understanding
EfficientViT is a new family of vision models for efficient high-resolution vision.
OpenMMLab Detection Toolbox and Benchmark
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
A tool for classifying an image into a disaster type, utilizing Python
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."