Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add simplenet architecture #1679

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

add simplenet architecture #1679

wants to merge 3 commits into from

Conversation

Coderx7
Copy link

@Coderx7 Coderx7 commented Feb 16, 2023

This pull request add SimpleNet architecture. Simplenetv1 is a 2016 architecture comprised of only the most basic operators that comprises a plain CNN network. It outperformed many deeper and more complex architectures such as VGGNet,ResNet,etc on several benchmark datasets. This is its results on ImageNet dataset.

-added simplenet.py to timm/models
-added simplenet.md to docs/models
-added an entry to docs/models.md

Here are some more information concerning how they perform taken from our official pytorch repository:

Model #Params ImageNet ImageNet-Real-Labels
simplenetv1_9m_m2(36.3 MB) 9.5m 74.23 / 91.748 81.22 / 94.756
simplenetv1_5m_m2(22 MB) 5.7m 72.03 / 90.324 79.328/ 93.714
simplenetv1_small_m2_075(12.6 MB) 3m 68.506/ 88.15 76.283/ 92.02
simplenetv1_small_m2_05(5.78 MB) 1.5m 61.67 / 83.488 69.31 / 88.195

SimpleNet performs very decently, it outperforms VGGNet, variants of ResNet and MobileNets(1-3)
and its pretty fast as well! and its all using plain old CNN!.

Here's an example of benchmark run on small variants of simplenet and some other known architectures such as mobilenets.
Small variants of simplenet consistently achieve high performance/accuracy:

model samples_per_sec param_count top1 top5
simplenetv1_small_m1_05 3100.26 1.51 61.122 82.988
mobilenetv3_small_050 3082.85 1.59 57.89 80.194
lcnet_050 2713.02 1.88 63.1 84.382
simplenetv1_small_m2_05 2536.16 1.51 61.67 83.488
mobilenetv3_small_075 1793.42 2.04 65.242 85.438
tf_mobilenetv3_small_075 1689.53 2.04 65.714 86.134
simplenetv1_small_m1_075 1626.87 3.29 67.784 87.718
tf_mobilenetv3_small_minimal_100 1316.91 2.04 62.908 84.234
simplenetv1_small_m2_075 1313.6 3.29 68.506 88.15
mobilenetv3_small_100 1261.09 2.54 67.656 87.634
tf_mobilenetv3_small_100 1213.03 2.54 67.924 87.664
mnasnet_small 1089.33 2.03 66.206 86.508
mobilenetv2_050 857.66 1.97 65.942 86.082
dla46_c 537.08 1.3 64.866 86.294
dla46x_c 323.03 1.07 65.97 86.98
dla60x_c 301.71 1.32 67.892 88.426

and this is a sample for larger models:

model samples_per_sec param_count top1 top5
simplenetv1_small_m1_075 2893.91 3.29 67.784 87.718
simplenetv1_small_m2_075 2478.41 3.29 68.506 88.15
vit_tiny_r_s16_p8_224 2337.23 6.34 71.792 90.822
simplenetv1_5m_m1 2105.06 5.75 71.548 89.94
simplenetv1_5m_m2 1754.25 5.75 72.03 90.324
resnet18 1750.38 11.69 69.744 89.082
regnetx_006 1620.25 6.2 73.86 91.672
mobilenetv3_large_100 1491.86 5.48 75.766 92.544
tf_mobilenetv3_large_minimal_100 1476.29 3.92 72.25 90.63
tf_mobilenetv3_large_075 1474.77 3.99 73.436 91.344
ghostnet_100 1390.19 5.18 73.974 91.46
tinynet_b 1345.82 3.73 74.976 92.184
tf_mobilenetv3_large_100 1325.06 5.48 75.518 92.604
mnasnet_100 1183.69 4.38 74.658 92.112
mobilenetv2_100 1101.58 3.5 72.97 91.02
simplenetv1_9m_m1 1048.91 9.51 73.792 91.486
resnet34 1030.4 21.8 75.114 92.284
deit_tiny_patch16_224 990.85 5.72 72.172 91.114
efficientnet_lite0 977.76 4.65 75.476 92.512
simplenetv1_9m_m2 900.45 9.51 74.23 91.748
tf_efficientnet_lite0 876.66 4.65 74.832 92.174
dla34 834.35 15.74 74.62 92.072
mobilenetv2_110d 824.4 4.52 75.038 92.184
resnet26 771.1 16 75.3 92.578
repvgg_b0 751.01 15.82 75.16 92.418
crossvit_9_240 606.2 8.55 73.96 91.968
vgg11 576.32 132.86 69.028 88.626
vit_base_patch32_224_sam 561.99 88.22 73.694 91.01
vgg11_bn 504.29 132.87 70.36 89.802
densenet121 435.3 7.98 75.584 92.652
vgg13 363.69 133.05 69.926 89.246
vgg13_bn 315.85 133.05 71.594 90.376
vgg16 302.84 138.36 71.59 90.382
vgg16_bn 265.99 138.37 73.35 91.504
vgg19 259.82 143.67 72.366 90.87
vgg19_bn 229.77 143.68 74.214 91.848

Note:
These benchmarks are run on a PC with GTX1080, Pytorch 1.11, fp32 and nchw configuration.

I hope this is useful for the community.

-added simplenet.py to timm/models
-added simplenet.md to docs/models
-added an entry to docs/models.md
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@rwightman
Copy link
Collaborator

@Coderx7 thanks for the PR, looks like a decent lightweight model, but the big stack of layers in a sequential doesn't really line up with other timm models, makes it hard to support many default features like feature extraction at strided stage boundaries, layer grouping, block based grad checkpointing, etc....

Any chance you could organize the net into stem + stages[blocks[]] ?

@Coderx7
Copy link
Author

Coderx7 commented Feb 17, 2023

@rwightman my pleasure. I tried to follow your vgg implementation and implement everything that was there.
I'm not familiar with the stem+stages. could you elaborate a bit more on this?

@rwightman
Copy link
Collaborator

@Coderx7 RexNet is probably the simplest example, ResNetV2 and RegNet are decent examples as well...

I also just refactored Levit to use stages (for feat extraction support), and it's similar to this net in that there aren't strided convs, but a 'downsample' layer that'd be at the start of strided stages.

@rwightman
Copy link
Collaborator

So looking at the net layout, two possible structures stand out:

stem:
      (128, 1, 0.0),
stage[0]
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
stage[1]
      ("p", 2, 0.0), 
      (320, 1, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
      (640, 1, 0.0),
stage[2]
      ("p", 2, 0.0),
      (2560, 1, 0.0, "k1"),
      (320, 1, 0.0, "k1"),
      (320, 1, 0.0),
head:
stem:
      (128, 1, 0.0),
stage[0]
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
stage[1]
      ("p", 2, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
stage[2]
      (640, 1, 0.0),
stage[3]
      ("p", 2, 0.0),
      (2560, 1, 0.0, "k1"),
stage[4]
      (320, 1, 0.0, "k1"),
      (320, 1, 0.0),
head:
``

@Coderx7
Copy link
Author

Coderx7 commented Feb 17, 2023

@rwightman Thanks a lot for the examples. I guess I'll give resnext a try and hopefully get it refactored soon.

@Coderx7
Copy link
Author

Coderx7 commented Feb 17, 2023

@rwightman :
I got a bit confused doing the refactoring, do you mind if I ask you questions while I try to refactor the architecture?
for the start, should the model look like this?
also how does timm handle the conversion of previous weights (model state_dict) to the new form?

SimpleNet(
  (stem): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Dropout2d(p=0.0, inplace=False)
    )
  )
  (features): Sequential(
    (stage_0): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_2): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_3): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_4): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_1): SimpleBlock(
      (block): Sequential(
        (maxpool_0): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_2): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_3): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_2): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_3): SimpleBlock(
      (block): Sequential(
        (maxpool_0): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(2048, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_4): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
  )
  (head): ClassifierHead(
    (global_pool): SelectAdaptivePool2d (pool_type=max, flatten=Flatten(start_dim=1, end_dim=-1))
    (fc): Linear(in_features=256, out_features=1000, bias=True)
    (flatten): Identity()
  )
)

@rwightman
Copy link
Collaborator

@Coderx7 structure looks nice

for conversion I usually write a fn called checkpoint_filter_fn

See:

Mapping a purely line 0..num_model_layers to stages is going to be a bit of fun, probably need to use regex, finding a rule that you can increment stage_idx on (ie every time outdim changes). Last ditch is just to iterate both state dicts together like the levit example and assume they line up (they should), assert that the num elements matches...

@rwightman
Copy link
Collaborator

that checkpoint filter should be passed to the builder ie https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/levit.py#L765

@Coderx7
Copy link
Author

Coderx7 commented Feb 18, 2023

@rwightman I got the checkpoint working, however for some reason when I try the features_only argument during model creation, it crashes and complains the return layers are not present in model :

AssertionError: Return layers ({'features.stage_0.block.ConvBlock_0', 'features.stage_3.block.maxpool', 'features.stage_0.block.ConvBlock_2', 'features.stage_1.block.maxpool'}) are not present in model

what should I specify in module name in feature_info list, what is it looking for?
if it helps this is how the model looks like :

SimpleNet(
  (stem): ConvBNReLU(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (bn): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (dropout): Dropout2d(p=0.0, inplace=False)
    (relu): ReLU(inplace=True)
  )
  (features): Sequential(
    (stage_0): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_2): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_3): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_4): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_1): SimpleBlock(
      (block): Sequential(
        (maxpool): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_2): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_2): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_3): SimpleBlock(
      (block): Sequential(
        (maxpool): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(2048, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_4): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
  )
  (head): ClassifierHead(
    (global_pool): SelectAdaptivePool2d (pool_type=max, flatten=Flatten(start_dim=1, end_dim=-1))
    (fc): Linear(in_features=256, out_features=1000, bias=True)
    (flatten): Identity()
  )
)

feature_info:

[{'num_chs': 64, 'reduction': 2, 'module': 'stem'},
 {'num_chs': 128,
  'reduction': 4,
  'module': 'features.stage_0.block.ConvBlock_0'},
 {'num_chs': 128,
  'reduction': 8,
  'module': 'features.stage_0.block.ConvBlock_2'},
 {'num_chs': 128, 'reduction': 16, 'module': 'features.stage_1.block.maxpool'},
 {'num_chs': 512, 'reduction': 32, 'module': 'features.stage_3.block.maxpool'}]

and this is the state_key.keys():

stem.conv.weight
stem.conv.bias
stem.bn.weight
stem.bn.bias
stem.bn.running_mean
stem.bn.running_var
stem.bn.num_batches_tracked
features.stage_0.block.ConvBlock_0.conv.weight
features.stage_0.block.ConvBlock_0.conv.bias
features.stage_0.block.ConvBlock_0.bn.weight
features.stage_0.block.ConvBlock_0.bn.bias
features.stage_0.block.ConvBlock_0.bn.running_mean
features.stage_0.block.ConvBlock_0.bn.running_var
features.stage_0.block.ConvBlock_0.bn.num_batches_tracked
features.stage_0.block.ConvBlock_1.conv.weight
features.stage_0.block.ConvBlock_1.conv.bias
features.stage_0.block.ConvBlock_1.bn.weight
features.stage_0.block.ConvBlock_1.bn.bias
features.stage_0.block.ConvBlock_1.bn.running_mean
features.stage_0.block.ConvBlock_1.bn.running_var
features.stage_0.block.ConvBlock_1.bn.num_batches_tracked
features.stage_0.block.ConvBlock_2.conv.weight
features.stage_0.block.ConvBlock_2.conv.bias
features.stage_0.block.ConvBlock_2.bn.weight
features.stage_0.block.ConvBlock_2.bn.bias
features.stage_0.block.ConvBlock_2.bn.running_mean
features.stage_0.block.ConvBlock_2.bn.running_var
features.stage_0.block.ConvBlock_2.bn.num_batches_tracked
features.stage_0.block.ConvBlock_3.conv.weight
features.stage_0.block.ConvBlock_3.conv.bias
features.stage_0.block.ConvBlock_3.bn.weight
features.stage_0.block.ConvBlock_3.bn.bias
features.stage_0.block.ConvBlock_3.bn.running_mean
features.stage_0.block.ConvBlock_3.bn.running_var
features.stage_0.block.ConvBlock_3.bn.num_batches_tracked
features.stage_0.block.ConvBlock_4.conv.weight
features.stage_0.block.ConvBlock_4.conv.bias
features.stage_0.block.ConvBlock_4.bn.weight
features.stage_0.block.ConvBlock_4.bn.bias
features.stage_0.block.ConvBlock_4.bn.running_mean
features.stage_0.block.ConvBlock_4.bn.running_var
features.stage_0.block.ConvBlock_4.bn.num_batches_tracked
features.stage_1.block.ConvBlock_0.conv.weight
features.stage_1.block.ConvBlock_0.conv.bias
features.stage_1.block.ConvBlock_0.bn.weight
features.stage_1.block.ConvBlock_0.bn.bias
features.stage_1.block.ConvBlock_0.bn.running_mean
features.stage_1.block.ConvBlock_0.bn.running_var
features.stage_1.block.ConvBlock_0.bn.num_batches_tracked
features.stage_1.block.ConvBlock_1.conv.weight
features.stage_1.block.ConvBlock_1.conv.bias
features.stage_1.block.ConvBlock_1.bn.weight
features.stage_1.block.ConvBlock_1.bn.bias
features.stage_1.block.ConvBlock_1.bn.running_mean
features.stage_1.block.ConvBlock_1.bn.running_var
features.stage_1.block.ConvBlock_1.bn.num_batches_tracked
features.stage_1.block.ConvBlock_2.conv.weight
features.stage_1.block.ConvBlock_2.conv.bias
features.stage_1.block.ConvBlock_2.bn.weight
features.stage_1.block.ConvBlock_2.bn.bias
features.stage_1.block.ConvBlock_2.bn.running_mean
features.stage_1.block.ConvBlock_2.bn.running_var
features.stage_1.block.ConvBlock_2.bn.num_batches_tracked
features.stage_2.block.ConvBlock_0.conv.weight
features.stage_2.block.ConvBlock_0.conv.bias
features.stage_2.block.ConvBlock_0.bn.weight
features.stage_2.block.ConvBlock_0.bn.bias
features.stage_2.block.ConvBlock_0.bn.running_mean
features.stage_2.block.ConvBlock_0.bn.running_var
features.stage_2.block.ConvBlock_0.bn.num_batches_tracked
features.stage_3.block.ConvBlock_0.conv.weight
features.stage_3.block.ConvBlock_0.conv.bias
features.stage_3.block.ConvBlock_0.bn.weight
features.stage_3.block.ConvBlock_0.bn.bias
features.stage_3.block.ConvBlock_0.bn.running_mean
features.stage_3.block.ConvBlock_0.bn.running_var
features.stage_3.block.ConvBlock_0.bn.num_batches_tracked
features.stage_4.block.ConvBlock_0.conv.weight
features.stage_4.block.ConvBlock_0.conv.bias
features.stage_4.block.ConvBlock_0.bn.weight
features.stage_4.block.ConvBlock_0.bn.bias
features.stage_4.block.ConvBlock_0.bn.running_mean
features.stage_4.block.ConvBlock_0.bn.running_var
features.stage_4.block.ConvBlock_0.bn.num_batches_tracked
features.stage_4.block.ConvBlock_1.conv.weight
features.stage_4.block.ConvBlock_1.conv.bias
features.stage_4.block.ConvBlock_1.bn.weight
features.stage_4.block.ConvBlock_1.bn.bias
features.stage_4.block.ConvBlock_1.bn.running_mean
features.stage_4.block.ConvBlock_1.bn.running_var
features.stage_4.block.ConvBlock_1.bn.num_batches_tracked
head.fc.weight
head.fc.bias

@rwightman
Copy link
Collaborator

@Coderx7 feature info should be filled with the module name of the 'deepest' layer for a given stride, so usually the nn.Module before a dowsample layer. In this case, you'd want stem, features.stage_0, features.stage_2, features.stage_4 ...aaand I just noticed there is a stride 2 on ConvBlock_2 of stage_0, if that's supposed to be there, that should split into a diff stage (stages deliminted by stride layers and in many cases, shifts in width)

@Coderx7
Copy link
Author

Coderx7 commented Feb 19, 2023

@rwightman Thanks. but there are two things here, first I believe I did just that but still got that same error anyway! I'll give that another try ans see how it goes.
and 2. concerning the stages, this architecture uses dynamic strides for any layers basically, but especially the first 4. (I can remove it and make it static as there are only two pretrained variants with two stride modes!)
the two trained variants use mode 1 and mode 2 strides which basically downsamples the early layers at specific rate so during imagenet training you can have some kind of leverage on performance/accuracy ratio at its simplest form.
like here, it uses strides of 2,2,1,2 and another variant uses 2,2,2 and the rest are 1s.
if I create stages based on the downsampling of features, stem, layer1,layer3 all should be in unique stages right? like stage1 to stage 2(excluding stem)?

@rwightman
Copy link
Collaborator

In the model create helper you should enable the flatten_sequential and ensure the default # of out indices matches the net

    out_indices = kwargs.pop('out_indices', (0, 1, 2, 3))
    model = build_model_with_cfg(
        EfficientFormerV2, variant, pretrained,
        feature_cfg=dict(flatten_sequential=True, out_indices=out_indices),
        **kwargs)

Most models have some sort of pattern and systematic spacing between the strided layers so figured that'd be the same here for the configs, I realize they could be put anywhere but doesn't seem that useful to have no depth between strides.

The concept of the stage is essentially encapsulate the layers at the same stride, and sometimes there are stages w/o any stride but a different width, conv type (depthwise vs not), or other trait in common with all layer repeats in the stage.

@Coderx7
Copy link
Author

Coderx7 commented Feb 19, 2023

@rwightman Thanks a lot. thats a fair point, however this was never meant to scale that way. it was designed with something completely different in mind. it was meant to show how one could maximize a networks performance under constraint (fixed_param count, depth and basic operators) while keeping everything simple and not resorting to any complex strategies.

having that said, thankfully I seem to pretty much have done everything and the only thing that seems to still be an issue is that the last stage has a bigger featuremap size(thus smaller stride) than its previous counter part. it seems timm has issues with it.
currently this is how my feature_info looks like:

[{'num_chs': 64, 'reduction': 2, 'module': 'stem'},
 {'num_chs': 128, 'reduction': 4, 'module': 'features.stage_0'},
 {'num_chs': 128, 'reduction': 8, 'module': 'features.stage_1'},
 {'num_chs': 512, 'reduction': 16, 'module': 'features.stage_2'},
 {'num_chs': 2048, 'reduction': 24, 'module': 'features.stage_3'},
 {'num_chs': 256, 'reduction': 20, 'module': 'features.stage_4'}]

How should I be handling this other than merging the last two stages?
thanks a lot in advance

@Coderx7
Copy link
Author

Coderx7 commented Feb 21, 2023

@rwightman would you kindly have a look here and tell me what to do for the last part? thanks

@rwightman
Copy link
Collaborator

@Coderx7 reduction is spatial reduction (from the input image size), it's only complained about if it decreases, it's not used directly by timm but some downstream users want to know that for calculating interpolation ratios.

If you look at the rexnet example it should *=2 every time there is a strided layer, majority of imagnet networks ar stride 32, num chs does not have any restrictions for increasing/decreasing although

@Coderx7
Copy link
Author

Coderx7 commented Feb 21, 2023

@rwightman I thought the idea was to provide featuremaps of different sizes for downstream usage not capturing only the strides of 2 per say.
currently if the assert in

assert 'reduction' in fi and fi['reduction'] >= prev_reduction

is not disabled this wont work.
so I need to do one of the following :

  1. to have 4 stages and only have reduction rates for 3 stages (that is don't include the last reduction rate for the last stage in feature_info
  2. to have 3 stages and merge the last 2 stages (3 and 4 and only have 3 stages in total with 3 reduction rates for each
  3. the featureinfo class is altered to have a new argument which allows cases like this,
  • the issue with the first option is users will lose the last two layers of the network if they opt out to use features_only, but other than than normal usage stays the same.
  • the issue with the second option is, users cant fully experiment with the stage 4, so they have to manually do this which nullifies the purpose of features_only I guess.
  • the last option seems like a good idea to me as with a default value that works for all current model, the current behavior is maintained, but it allows for cases such as this to also be usable. unless that check has more significance and affects lots of other parts of the library which I'm not aware of yet.

so which option should I take and hopefully finish this up?
Thanks a lot in advance.

@Coderx7
Copy link
Author

Coderx7 commented Feb 23, 2023

@rwightman I'd really appreciate if you could kindly have a look and decide on the next step so I can finalize the changes accordingly and have it finished.

@rwightman
Copy link
Collaborator

@Coderx7 sorry I have a lot on my plate right now, wrapping a up a few things before I'm on vacation for a bit. I'm going to have to leave this one hanging for a bit as I don't think we're on the same page.

The net is simple as per its name and I didn't see any merging, or upscaling or anything that could result in a feature map increasing in size, it's reducing by 2 at each downscale. I feel we're lost in semantics.

@Coderx7
Copy link
Author

Coderx7 commented Feb 23, 2023

@rwightman out of the last three conv layers, two (2048 and 256) have kernel size=1 and they use padding of 1, that causes the featuremap size to increase from previously 7x7 (after the down sampling) to 9x9(after conv 1x1), the next conv1x1 layer increases that to 11x11, thus causing the effective reduction to vary that way.
Ok no problem, please take your time and let's continue this when you are free.
I really do appreciate you taking the time despite all of your busy schedule.

@Coderx7
Copy link
Author

Coderx7 commented Mar 13, 2023

@rwightman May I ask if your vacation is over and if we can hopefully get this last step worked out?

@rwightman
Copy link
Collaborator

@Coderx7 been trying to get on top of my own tasks since getting back. I looked at this a bit more, not really liking the padding issue that is the reason for the expanding dim... having a padding of 1 for a 1x1 conv makes zero sense to me. It's adding data to the signal path that's not meaningful. So, I'm hesitant to add alltogether with quirks like that present...

@Coderx7
Copy link
Author

Coderx7 commented Mar 17, 2023

@rwightman Thanks, I really appreciate it knowing how busy your schedule is.
its not really any different than using (zero)-padding on the input.
This happened by accident, but after I noticed it, in a few experiments that I did afterward, I noticed they perform better than the no padding versions, looked to me as if it creates a kind of regularization effect.
I can run more experiments to further validate this point (or lack thereof, if that happens to be the case ultimately), if that's your concern.
my main concern is that, it takes a lot of time to train these models again, (it took me several months to train these models as I don't have access to anything powerful, just a single gpu). but I try my best to see how I can address your concerns.

@rwightman
Copy link
Collaborator

@Coderx7 in deep learning it would seem almost any extra activations (or parameters) can/will be used to improve the loss in optimization, but I'd argue not particularly useful ones (and possibly harmful for segmentation/obj detection as they'd add a 'border' effect at the feature level). They get blended back into the signal via the subsequent 3x3 conv. I did test these and per the goal of running faster, the extra padding does have a measurable speed impact (not significant but there).

The rest of the net is fine, simple as per the name which isn't bad to have in timm as they can be the best option for some tasks. If the padding issue fixed (padding == kernel_size//2 should do fine for this net) and retrained I'd definitely include with the tweaks mentioned.

Do you have hparams for these? I have two idle 2x Titan RTX machines right now, I could put them to work if you push any outstanding changes re arch to this PR.

@Coderx7

This comment was marked as resolved.

@Coderx7

This comment was marked as outdated.

    the last two 1x1 convs now use no padding, this is done to make
    the architetcure in line with what timms standards.
    because of this change the pretrained weights are no more valid
    and this needs to be retrained.
These are the updated imagenet pretrained weights with improved accuracy.
@Coderx7
Copy link
Author

Coderx7 commented Apr 14, 2023

@rwightman Hi, hope you are doing great,
I finally finished training the new weights and I just updated the pr.
would you please kindly tell me what you think.
Thanks a lot in advance.

@Coderx7
Copy link
Author

Coderx7 commented Jul 25, 2023

@rwightman its been a few months since my last changes, could you kindly please tell me if everything is OK or I'm missing sth here?
Id really like to make this happen if you will of course.
Thanks a lot in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants