Skip to content

Releases: huggingface/pytorch-image-models

Release v1.0.3

15 May 18:19
Compare
Choose a tag to compare

May 14, 2024

  • Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
  • Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
  • Add normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)
  • Version 1.0.3 release

May 11, 2024

  • Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.
model top1 top5 param_count img_size
vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k 86.202 97.874 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k 85.418 97.48 60.4 256
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k 84.322 96.812 63.95 256
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k 83.906 96.684 60.23 256
vit_base_patch16_rope_reg1_gap_256.sbb_in1k 83.866 96.67 86.43 256
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k 83.81 96.824 38.74 256
vit_betwixt_patch16_reg4_gap_256.sbb_in1k 83.706 96.616 60.4 256
vit_betwixt_patch16_reg1_gap_256.sbb_in1k 83.628 96.544 60.4 256
vit_medium_patch16_reg4_gap_256.sbb_in1k 83.47 96.622 38.88 256
vit_medium_patch16_reg1_gap_256.sbb_in1k 83.462 96.548 38.88 256
vit_little_patch16_reg4_gap_256.sbb_in1k 82.514 96.262 22.52 256
vit_wee_patch16_reg1_gap_256.sbb_in1k 80.256 95.360 13.42 256
vit_pwee_patch16_reg1_gap_256.sbb_in1k 80.072 95.136 15.25 256
vit_mediumd_patch16_reg4_gap_256.sbb_in12k N/A N/A 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k N/A N/A 60.4 256
  • AttentionExtract helper added to extract attention maps from timm models. See example in #1232 (comment)
  • forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.
  • 1017 of 1047 model architectures support features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.
  • Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.

April 11, 2024

  • Prepping for a long overdue 1.0 release, things have been stable for a while now.
  • Significant feature that's been missing for a while, features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')
  • Above feature support achieved through a new forward_intermediates() API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input) 
output = model.forward_head(final_feat)  # pooling + classifier head

print(final_feat.shape)
torch.Size([2, 197, 768])

for f in intermediates:
    print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])

print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))

for o in output:    
    print(o.shape)   
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
  • TinyCLIP vision tower weights added, thx Thien Tran

Release v0.9.16

19 Feb 19:35
6e6f368
Compare
Choose a tag to compare

Feb 19, 2024

  • Next-ViT models added. Adapted from https://github.com/bytedance/Next-ViT
  • HGNet and PP-HGNetV2 models added. Adapted from https://github.com/PaddlePaddle/PaddleClas by SeeFun
  • Removed setup.py, moved to pyproject.toml based build supported by PDM
  • Add updated model EMA impl using _for_each for less overhead
  • Support device args in train script for non GPU devices
  • Other misc fixes and small additions
  • Min supported Python version increased to 3.8
  • Release 0.9.16

Jan 8, 2024

Datasets & transform refactoring

  • HuggingFace streaming (iterable) dataset support (--dataset hfids:org/dataset)
  • Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
  • Tested HF datasets and webdataset wrapper streaming from HF hub with recent timm ImageNet uploads to https://huggingface.co/timm
  • Make input & target column/field keys consistent across datasets and pass via args
  • Full monochrome support when using e:g: --input-size 1 224 224 or --in-chans 1, sets PIL image conversion appropriately in dataset
  • Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
  • Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
  • Allow train without validation set (--val-split '') in train script
  • Add --bce-sum (sum over class dim) and --bce-pos-weight (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding

Release v0.9.12

24 Nov 19:09
Compare
Choose a tag to compare

Nov 23, 2023

  • Added EfficientViT-Large models, thanks SeeFun
  • Fix Python 3.7 compat, will be dropping support for it soon
  • Other misc fixes
  • Release 0.9.12

Release v0.9.11

20 Nov 23:16
Compare
Choose a tag to compare

Nov 20, 2023

Release v0.9.10

04 Nov 15:23
Compare
Choose a tag to compare

Nov 4

  • Patch fix for 0.9.9 to fix FrozenBatchnorm2d import path for old torchvision (~2 years )

Nov 3, 2023

  • DFN (Data Filtering Networks) and MetaCLIP ViT weights added
  • DINOv2 'register' ViT model weights added
  • Add quickgelu ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
  • Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
  • ImageNet-12k fine-tuned (from LAION-2B CLIP) convnext_xxlarge
  • 0.9.9 release

Release v0.9.9

03 Nov 22:24
Compare
Choose a tag to compare

Nov 3, 2023

  • DFN (Data Filtering Networks) and MetaCLIP ViT weights added
  • DINOv2 'register' ViT model weights added
  • Add quickgelu ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
  • Improved typing added to ResNet, MobileNet-v3 thanks to Aryan
  • ImageNet-12k fine-tuned (from LAION-2B CLIP) convnext_xxlarge
  • 0.9.9 release

Release v0.9.8

21 Oct 20:52
Compare
Choose a tag to compare

Oct 20, 2023

  • SigLIP image tower weights supported in vision_transformer.py.
    • Great potential for fine-tune and downstream feature use.
  • Experimental 'register' support in vit models as per Vision Transformers Need Registers
  • Updated RepViT with new weight release. Thanks wangao
  • Add patch resizing support (on pretrained weight load) to Swin models
  • 0.9.8 release

Release v0.9.7

02 Sep 19:49
730b907
Compare
Choose a tag to compare

Small bug fix & extra model from v0.9.6

Sep 1, 2023

  • TinyViT added by SeeFun
  • Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
  • 0.9.7 release

Release v0.9.6

29 Aug 19:06
Compare
Choose a tag to compare

Aug 28, 2023

  • Add dynamic img size support to models in vision_transformer.py, vision_transformer_hybrid.py, deit.py, and eva.py w/o breaking backward compat.
    • Add dynamic_img_size=True to args at model creation time to allow changing the grid size (interpolate abs and/or ROPE pos embed each forward pass).
    • Add dynamic_img_pad=True to allow image sizes that aren't divisible by patch size (pad bottom right to patch size each forward pass).
    • Enabling either dynamic mode will break FX tracing unless PatchEmbed module added as leaf.
    • Existing method of resizing position embedding by passing different img_size (interpolate pretrained embed weights once) on creation still works.
    • Existing method of changing patch_size (resize pretrained patch_embed weights once) on creation still works.
    • Example validation cmd python validate.py /imagenet --model vit_base_patch16_224 --amp --amp-dtype bfloat16 --img-size 255 --crop-pct 1.0 --model-kwargs dynamic_img_size=True dyamic_img_pad=True

Aug 25, 2023

Aug 11, 2023

  • Swin, MaxViT, CoAtNet, and BEiT models support resizing of image/window size on creation with adaptation of pretrained weights
  • Example validation cmd to test w/ non-square resize python validate.py /imagenet --model swin_base_patch4_window7_224.ms_in22k_ft_in1k --amp --amp-dtype bfloat16 --input-size 3 256 320 --model-kwargs window_size=8,10 img_size=256,320

Release v0.9.5

03 Aug 23:55
Compare
Choose a tag to compare

Minor updates and bug fixes. New ResNeXT w/ highest ImageNet eval I'm aware of in the ResNe(X)t family (seresnextaa201d_32x8d.sw_in12k_ft_in1k_384)

Aug 3, 2023

  • Add GluonCV weights for HRNet w18_small and w18_small_v2. Converted by SeeFun
  • Fix selecsls* model naming regression
  • Patch and position embedding for ViT/EVA works for bfloat16/float16 weights on load (or activations for on-the-fly resize)
  • v0.9.5 release prep

July 27, 2023

  • Added timm trained seresnextaa201d_32x8d.sw_in12k_ft_in1k_384 weights (and .sw_in12k pretrain) with 87.3% top-1 on ImageNet-1k, best ImageNet ResNet family model I'm aware of.
  • RepViT model and weights (https://arxiv.org/abs/2307.09283) added by wangao
  • I-JEPA ViT feature weights (no classifier) added by SeeFun
  • SAM-ViT (segment anything) feature weights (no classifier) added by SeeFun
  • Add support for alternative feat extraction methods and -ve indices to EfficientNet
  • Add NAdamW optimizer
  • Misc fixes