This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
-
Updated
Apr 12, 2024 - Python
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Microsoft COCO: Common Objects in Context for huggingface datasets
FQDet: Fast-converging Query-based Detector
Python library for converting annotated datasets into various formats (e.g., image classification, object detection and speech datasets).
Object Detection Dataset Format Converter
A repository to support the development of a repository and interchange format for weed identification annotation
A tool for converting computer vision label formats.
CVNets: A library for training computer vision networks
Image Caption Generator using a Pretrained ResNet-50 and an LSTM architecture. Trained on COCO 2017 dataset, it's accessible via a Streamlit app.
COCOA: Semantic Amodal Segmentation for huggingface datasets
Trident Pyramid Networks for Object Detection (BMVC 2022)
Official ImageNet Model repository
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Object detection with multi-level representations generated from deep high-resolution representation learning (HRNetV2h). This is an official implementation for our TPAMI paper "Deep High-Resolution Representation Learning for Visual Recognition". https://arxiv.org/abs/1908.07919
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Adds SPICE metric to coco-caption evaluation server codes
Semantic Propositional Image Caption Evaluation
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
Official implementation of "Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation"
Add a description, image, and links to the mscoco topic page so that developers can more easily learn about it.
To associate your repository with the mscoco topic, visit your repo's landing page and select "manage topics."