add simplenet architecture #1679

Coderx7 · 2023-02-16T19:20:53Z

This pull request add SimpleNet architecture. Simplenetv1 is a 2016 architecture comprised of only the most basic operators that comprises a plain CNN network. It outperformed many deeper and more complex architectures such as VGGNet,ResNet,etc on several benchmark datasets. This is its results on ImageNet dataset.

-added simplenet.py to timm/models
-added simplenet.md to docs/models
-added an entry to docs/models.md

Here are some more information concerning how they perform taken from our official pytorch repository:

Model	#Params	ImageNet	ImageNet-Real-Labels
simplenetv1_9m_m2(36.3 MB)	9.5m	74.23 / 91.748	81.22 / 94.756
simplenetv1_5m_m2(22 MB)	5.7m	72.03 / 90.324	79.328/ 93.714
simplenetv1_small_m2_075(12.6 MB)	3m	68.506/ 88.15	76.283/ 92.02
simplenetv1_small_m2_05(5.78 MB)	1.5m	61.67 / 83.488	69.31 / 88.195

SimpleNet performs very decently, it outperforms VGGNet, variants of ResNet and MobileNets(1-3)
and its pretty fast as well! and its all using plain old CNN!.

Here's an example of benchmark run on small variants of simplenet and some other known architectures such as mobilenets.
Small variants of simplenet consistently achieve high performance/accuracy:

model	samples_per_sec	param_count	top1	top5
simplenetv1_small_m1_05	3100.26	1.51	61.122	82.988
mobilenetv3_small_050	3082.85	1.59	57.89	80.194
lcnet_050	2713.02	1.88	63.1	84.382
simplenetv1_small_m2_05	2536.16	1.51	61.67	83.488
mobilenetv3_small_075	1793.42	2.04	65.242	85.438
tf_mobilenetv3_small_075	1689.53	2.04	65.714	86.134
simplenetv1_small_m1_075	1626.87	3.29	67.784	87.718
tf_mobilenetv3_small_minimal_100	1316.91	2.04	62.908	84.234
simplenetv1_small_m2_075	1313.6	3.29	68.506	88.15
mobilenetv3_small_100	1261.09	2.54	67.656	87.634
tf_mobilenetv3_small_100	1213.03	2.54	67.924	87.664
mnasnet_small	1089.33	2.03	66.206	86.508
mobilenetv2_050	857.66	1.97	65.942	86.082
dla46_c	537.08	1.3	64.866	86.294
dla46x_c	323.03	1.07	65.97	86.98
dla60x_c	301.71	1.32	67.892	88.426

and this is a sample for larger models:

model	samples_per_sec	param_count	top1	top5
simplenetv1_small_m1_075	2893.91	3.29	67.784	87.718
simplenetv1_small_m2_075	2478.41	3.29	68.506	88.15
vit_tiny_r_s16_p8_224	2337.23	6.34	71.792	90.822
simplenetv1_5m_m1	2105.06	5.75	71.548	89.94
simplenetv1_5m_m2	1754.25	5.75	72.03	90.324
resnet18	1750.38	11.69	69.744	89.082
regnetx_006	1620.25	6.2	73.86	91.672
mobilenetv3_large_100	1491.86	5.48	75.766	92.544
tf_mobilenetv3_large_minimal_100	1476.29	3.92	72.25	90.63
tf_mobilenetv3_large_075	1474.77	3.99	73.436	91.344
ghostnet_100	1390.19	5.18	73.974	91.46
tinynet_b	1345.82	3.73	74.976	92.184
tf_mobilenetv3_large_100	1325.06	5.48	75.518	92.604
mnasnet_100	1183.69	4.38	74.658	92.112
mobilenetv2_100	1101.58	3.5	72.97	91.02
simplenetv1_9m_m1	1048.91	9.51	73.792	91.486
resnet34	1030.4	21.8	75.114	92.284
deit_tiny_patch16_224	990.85	5.72	72.172	91.114
efficientnet_lite0	977.76	4.65	75.476	92.512
simplenetv1_9m_m2	900.45	9.51	74.23	91.748
tf_efficientnet_lite0	876.66	4.65	74.832	92.174
dla34	834.35	15.74	74.62	92.072
mobilenetv2_110d	824.4	4.52	75.038	92.184
resnet26	771.1	16	75.3	92.578
repvgg_b0	751.01	15.82	75.16	92.418
crossvit_9_240	606.2	8.55	73.96	91.968
vgg11	576.32	132.86	69.028	88.626
vit_base_patch32_224_sam	561.99	88.22	73.694	91.01
vgg11_bn	504.29	132.87	70.36	89.802
densenet121	435.3	7.98	75.584	92.652
vgg13	363.69	133.05	69.926	89.246
vgg13_bn	315.85	133.05	71.594	90.376
vgg16	302.84	138.36	71.59	90.382
vgg16_bn	265.99	138.37	73.35	91.504
vgg19	259.82	143.67	72.366	90.87
vgg19_bn	229.77	143.68	74.214	91.848

Note:
These benchmarks are run on a PC with GTX1080, Pytorch 1.11, fp32 and nchw configuration.

I hope this is useful for the community.

-added simplenet.py to timm/models -added simplenet.md to docs/models -added an entry to docs/models.md

HuggingFaceDocBuilderDev · 2023-02-16T21:02:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

rwightman · 2023-02-17T05:44:23Z

@Coderx7 thanks for the PR, looks like a decent lightweight model, but the big stack of layers in a sequential doesn't really line up with other timm models, makes it hard to support many default features like feature extraction at strided stage boundaries, layer grouping, block based grad checkpointing, etc....

Any chance you could organize the net into stem + stages[blocks[]] ?

Coderx7 · 2023-02-17T06:20:35Z

@rwightman my pleasure. I tried to follow your vgg implementation and implement everything that was there.
I'm not familiar with the stem+stages. could you elaborate a bit more on this?

rwightman · 2023-02-17T07:38:03Z

@Coderx7 RexNet is probably the simplest example, ResNetV2 and RegNet are decent examples as well...

I also just refactored Levit to use stages (for feat extraction support), and it's similar to this net in that there aren't strided convs, but a 'downsample' layer that'd be at the start of strided stages.

https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/levit.py

rwightman · 2023-02-17T07:44:15Z

So looking at the net layout, two possible structures stand out:

stem:
      (128, 1, 0.0),
stage[0]
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
stage[1]
      ("p", 2, 0.0), 
      (320, 1, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
      (640, 1, 0.0),
stage[2]
      ("p", 2, 0.0),
      (2560, 1, 0.0, "k1"),
      (320, 1, 0.0, "k1"),
      (320, 1, 0.0),
head:

stem:
      (128, 1, 0.0),
stage[0]
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
      (192, 1, 0.0),
stage[1]
      ("p", 2, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
      (320, 1, 0.0),
stage[2]
      (640, 1, 0.0),
stage[3]
      ("p", 2, 0.0),
      (2560, 1, 0.0, "k1"),
stage[4]
      (320, 1, 0.0, "k1"),
      (320, 1, 0.0),
head:
``

Coderx7 · 2023-02-17T08:28:49Z

@rwightman Thanks a lot for the examples. I guess I'll give resnext a try and hopefully get it refactored soon.

Coderx7 · 2023-02-17T17:30:16Z

@rwightman :
I got a bit confused doing the refactoring, do you mind if I ask you questions while I try to refactor the architecture?
for the start, should the model look like this?
also how does timm handle the conversion of previous weights (model state_dict) to the new form?

SimpleNet(
  (stem): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Dropout2d(p=0.0, inplace=False)
    )
  )
  (features): Sequential(
    (stage_0): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_2): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_3): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_4): Sequential(
          (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_1): SimpleBlock(
      (block): Sequential(
        (maxpool_0): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_2): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_3): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_2): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_3): SimpleBlock(
      (block): Sequential(
        (maxpool_0): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(2048, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
    (stage_4): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): Sequential(
          (0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
        (ConvBlock_1): Sequential(
          (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
          (3): Dropout2d(p=0.0, inplace=False)
        )
      )
    )
  )
  (head): ClassifierHead(
    (global_pool): SelectAdaptivePool2d (pool_type=max, flatten=Flatten(start_dim=1, end_dim=-1))
    (fc): Linear(in_features=256, out_features=1000, bias=True)
    (flatten): Identity()
  )
)

rwightman · 2023-02-17T18:13:10Z

@Coderx7 structure looks nice

for conversion I usually write a fn called checkpoint_filter_fn

See:

Mapping a purely line 0..num_model_layers to stages is going to be a bit of fun, probably need to use regex, finding a rule that you can increment stage_idx on (ie every time outdim changes). Last ditch is just to iterate both state dicts together like the levit example and assume they line up (they should), assert that the num elements matches...

rwightman · 2023-02-17T18:15:47Z

that checkpoint filter should be passed to the builder ie https://github.com/rwightman/pytorch-image-models/blob/main/timm/models/levit.py#L765

Coderx7 · 2023-02-18T17:04:18Z

@rwightman I got the checkpoint working, however for some reason when I try the features_only argument during model creation, it crashes and complains the return layers are not present in model :

AssertionError: Return layers ({'features.stage_0.block.ConvBlock_0', 'features.stage_3.block.maxpool', 'features.stage_0.block.ConvBlock_2', 'features.stage_1.block.maxpool'}) are not present in model

what should I specify in module name in feature_info list, what is it looking for?
if it helps this is how the model looks like :

SimpleNet(
  (stem): ConvBNReLU(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (bn): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (dropout): Dropout2d(p=0.0, inplace=False)
    (relu): ReLU(inplace=True)
  )
  (features): Sequential(
    (stage_0): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_2): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_3): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_4): ConvBNReLU(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_1): SimpleBlock(
      (block): Sequential(
        (maxpool): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_2): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_2): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(512, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_3): SimpleBlock(
      (block): Sequential(
        (maxpool): Sequential(
          (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
          (1): Dropout2d(p=0.0, inplace=True)
        )
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(2048, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (stage_4): SimpleBlock(
      (block): Sequential(
        (ConvBlock_0): ConvBNReLU(
          (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
        (ConvBlock_1): ConvBNReLU(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
          (dropout): Dropout2d(p=0.0, inplace=False)
          (relu): ReLU(inplace=True)
        )
      )
    )
  )
  (head): ClassifierHead(
    (global_pool): SelectAdaptivePool2d (pool_type=max, flatten=Flatten(start_dim=1, end_dim=-1))
    (fc): Linear(in_features=256, out_features=1000, bias=True)
    (flatten): Identity()
  )
)

feature_info:

[{'num_chs': 64, 'reduction': 2, 'module': 'stem'},
 {'num_chs': 128,
  'reduction': 4,
  'module': 'features.stage_0.block.ConvBlock_0'},
 {'num_chs': 128,
  'reduction': 8,
  'module': 'features.stage_0.block.ConvBlock_2'},
 {'num_chs': 128, 'reduction': 16, 'module': 'features.stage_1.block.maxpool'},
 {'num_chs': 512, 'reduction': 32, 'module': 'features.stage_3.block.maxpool'}]

and this is the state_key.keys():

stem.conv.weight
stem.conv.bias
stem.bn.weight
stem.bn.bias
stem.bn.running_mean
stem.bn.running_var
stem.bn.num_batches_tracked
features.stage_0.block.ConvBlock_0.conv.weight
features.stage_0.block.ConvBlock_0.conv.bias
features.stage_0.block.ConvBlock_0.bn.weight
features.stage_0.block.ConvBlock_0.bn.bias
features.stage_0.block.ConvBlock_0.bn.running_mean
features.stage_0.block.ConvBlock_0.bn.running_var
features.stage_0.block.ConvBlock_0.bn.num_batches_tracked
features.stage_0.block.ConvBlock_1.conv.weight
features.stage_0.block.ConvBlock_1.conv.bias
features.stage_0.block.ConvBlock_1.bn.weight
features.stage_0.block.ConvBlock_1.bn.bias
features.stage_0.block.ConvBlock_1.bn.running_mean
features.stage_0.block.ConvBlock_1.bn.running_var
features.stage_0.block.ConvBlock_1.bn.num_batches_tracked
features.stage_0.block.ConvBlock_2.conv.weight
features.stage_0.block.ConvBlock_2.conv.bias
features.stage_0.block.ConvBlock_2.bn.weight
features.stage_0.block.ConvBlock_2.bn.bias
features.stage_0.block.ConvBlock_2.bn.running_mean
features.stage_0.block.ConvBlock_2.bn.running_var
features.stage_0.block.ConvBlock_2.bn.num_batches_tracked
features.stage_0.block.ConvBlock_3.conv.weight
features.stage_0.block.ConvBlock_3.conv.bias
features.stage_0.block.ConvBlock_3.bn.weight
features.stage_0.block.ConvBlock_3.bn.bias
features.stage_0.block.ConvBlock_3.bn.running_mean
features.stage_0.block.ConvBlock_3.bn.running_var
features.stage_0.block.ConvBlock_3.bn.num_batches_tracked
features.stage_0.block.ConvBlock_4.conv.weight
features.stage_0.block.ConvBlock_4.conv.bias
features.stage_0.block.ConvBlock_4.bn.weight
features.stage_0.block.ConvBlock_4.bn.bias
features.stage_0.block.ConvBlock_4.bn.running_mean
features.stage_0.block.ConvBlock_4.bn.running_var
features.stage_0.block.ConvBlock_4.bn.num_batches_tracked
features.stage_1.block.ConvBlock_0.conv.weight
features.stage_1.block.ConvBlock_0.conv.bias
features.stage_1.block.ConvBlock_0.bn.weight
features.stage_1.block.ConvBlock_0.bn.bias
features.stage_1.block.ConvBlock_0.bn.running_mean
features.stage_1.block.ConvBlock_0.bn.running_var
features.stage_1.block.ConvBlock_0.bn.num_batches_tracked
features.stage_1.block.ConvBlock_1.conv.weight
features.stage_1.block.ConvBlock_1.conv.bias
features.stage_1.block.ConvBlock_1.bn.weight
features.stage_1.block.ConvBlock_1.bn.bias
features.stage_1.block.ConvBlock_1.bn.running_mean
features.stage_1.block.ConvBlock_1.bn.running_var
features.stage_1.block.ConvBlock_1.bn.num_batches_tracked
features.stage_1.block.ConvBlock_2.conv.weight
features.stage_1.block.ConvBlock_2.conv.bias
features.stage_1.block.ConvBlock_2.bn.weight
features.stage_1.block.ConvBlock_2.bn.bias
features.stage_1.block.ConvBlock_2.bn.running_mean
features.stage_1.block.ConvBlock_2.bn.running_var
features.stage_1.block.ConvBlock_2.bn.num_batches_tracked
features.stage_2.block.ConvBlock_0.conv.weight
features.stage_2.block.ConvBlock_0.conv.bias
features.stage_2.block.ConvBlock_0.bn.weight
features.stage_2.block.ConvBlock_0.bn.bias
features.stage_2.block.ConvBlock_0.bn.running_mean
features.stage_2.block.ConvBlock_0.bn.running_var
features.stage_2.block.ConvBlock_0.bn.num_batches_tracked
features.stage_3.block.ConvBlock_0.conv.weight
features.stage_3.block.ConvBlock_0.conv.bias
features.stage_3.block.ConvBlock_0.bn.weight
features.stage_3.block.ConvBlock_0.bn.bias
features.stage_3.block.ConvBlock_0.bn.running_mean
features.stage_3.block.ConvBlock_0.bn.running_var
features.stage_3.block.ConvBlock_0.bn.num_batches_tracked
features.stage_4.block.ConvBlock_0.conv.weight
features.stage_4.block.ConvBlock_0.conv.bias
features.stage_4.block.ConvBlock_0.bn.weight
features.stage_4.block.ConvBlock_0.bn.bias
features.stage_4.block.ConvBlock_0.bn.running_mean
features.stage_4.block.ConvBlock_0.bn.running_var
features.stage_4.block.ConvBlock_0.bn.num_batches_tracked
features.stage_4.block.ConvBlock_1.conv.weight
features.stage_4.block.ConvBlock_1.conv.bias
features.stage_4.block.ConvBlock_1.bn.weight
features.stage_4.block.ConvBlock_1.bn.bias
features.stage_4.block.ConvBlock_1.bn.running_mean
features.stage_4.block.ConvBlock_1.bn.running_var
features.stage_4.block.ConvBlock_1.bn.num_batches_tracked
head.fc.weight
head.fc.bias

rwightman · 2023-02-19T06:50:07Z

@Coderx7 feature info should be filled with the module name of the 'deepest' layer for a given stride, so usually the nn.Module before a dowsample layer. In this case, you'd want stem, features.stage_0, features.stage_2, features.stage_4 ...aaand I just noticed there is a stride 2 on ConvBlock_2 of stage_0, if that's supposed to be there, that should split into a diff stage (stages deliminted by stride layers and in many cases, shifts in width)

Coderx7 · 2023-02-19T07:23:12Z

@rwightman Thanks. but there are two things here, first I believe I did just that but still got that same error anyway! I'll give that another try ans see how it goes.
and 2. concerning the stages, this architecture uses dynamic strides for any layers basically, but especially the first 4. (I can remove it and make it static as there are only two pretrained variants with two stride modes!)
the two trained variants use mode 1 and mode 2 strides which basically downsamples the early layers at specific rate so during imagenet training you can have some kind of leverage on performance/accuracy ratio at its simplest form.
like here, it uses strides of 2,2,1,2 and another variant uses 2,2,2 and the rest are 1s.
if I create stages based on the downsampling of features, stem, layer1,layer3 all should be in unique stages right? like stage1 to stage 2(excluding stem)?

rwightman · 2023-02-19T07:32:25Z

In the model create helper you should enable the flatten_sequential and ensure the default # of out indices matches the net

    out_indices = kwargs.pop('out_indices', (0, 1, 2, 3))
    model = build_model_with_cfg(
        EfficientFormerV2, variant, pretrained,
        feature_cfg=dict(flatten_sequential=True, out_indices=out_indices),
        **kwargs)

Most models have some sort of pattern and systematic spacing between the strided layers so figured that'd be the same here for the configs, I realize they could be put anywhere but doesn't seem that useful to have no depth between strides.

The concept of the stage is essentially encapsulate the layers at the same stride, and sometimes there are stages w/o any stride but a different width, conv type (depthwise vs not), or other trait in common with all layer repeats in the stage.

Coderx7 · 2023-02-19T12:46:52Z

@rwightman Thanks a lot. thats a fair point, however this was never meant to scale that way. it was designed with something completely different in mind. it was meant to show how one could maximize a networks performance under constraint (fixed_param count, depth and basic operators) while keeping everything simple and not resorting to any complex strategies.

having that said, thankfully I seem to pretty much have done everything and the only thing that seems to still be an issue is that the last stage has a bigger featuremap size(thus smaller stride) than its previous counter part. it seems timm has issues with it.
currently this is how my feature_info looks like:

[{'num_chs': 64, 'reduction': 2, 'module': 'stem'},
 {'num_chs': 128, 'reduction': 4, 'module': 'features.stage_0'},
 {'num_chs': 128, 'reduction': 8, 'module': 'features.stage_1'},
 {'num_chs': 512, 'reduction': 16, 'module': 'features.stage_2'},
 {'num_chs': 2048, 'reduction': 24, 'module': 'features.stage_3'},
 {'num_chs': 256, 'reduction': 20, 'module': 'features.stage_4'}]

How should I be handling this other than merging the last two stages?
thanks a lot in advance

Coderx7 · 2023-02-21T05:16:45Z

@rwightman would you kindly have a look here and tell me what to do for the last part? thanks

rwightman · 2023-02-21T06:59:27Z

@Coderx7 reduction is spatial reduction (from the input image size), it's only complained about if it decreases, it's not used directly by timm but some downstream users want to know that for calculating interpolation ratios.

If you look at the rexnet example it should *=2 every time there is a strided layer, majority of imagnet networks ar stride 32, num chs does not have any restrictions for increasing/decreasing although

Coderx7 · 2023-02-21T08:16:16Z

@rwightman I thought the idea was to provide featuremaps of different sizes for downstream usage not capturing only the strides of 2 per say.
currently if the assert in

assert 'reduction' in fi and fi['reduction'] >= prev_reduction

is not disabled this wont work.
so I need to do one of the following :

to have 4 stages and only have reduction rates for 3 stages (that is don't include the last reduction rate for the last stage in feature_info
to have 3 stages and merge the last 2 stages (3 and 4 and only have 3 stages in total with 3 reduction rates for each
the featureinfo class is altered to have a new argument which allows cases like this,

the issue with the first option is users will lose the last two layers of the network if they opt out to use features_only, but other than than normal usage stays the same.
the issue with the second option is, users cant fully experiment with the stage 4, so they have to manually do this which nullifies the purpose of features_only I guess.
the last option seems like a good idea to me as with a default value that works for all current model, the current behavior is maintained, but it allows for cases such as this to also be usable. unless that check has more significance and affects lots of other parts of the library which I'm not aware of yet.

so which option should I take and hopefully finish this up?
Thanks a lot in advance.

Coderx7 · 2023-02-23T05:41:17Z

@rwightman I'd really appreciate if you could kindly have a look and decide on the next step so I can finalize the changes accordingly and have it finished.

rwightman · 2023-02-23T06:39:51Z

@Coderx7 sorry I have a lot on my plate right now, wrapping a up a few things before I'm on vacation for a bit. I'm going to have to leave this one hanging for a bit as I don't think we're on the same page.

The net is simple as per its name and I didn't see any merging, or upscaling or anything that could result in a feature map increasing in size, it's reducing by 2 at each downscale. I feel we're lost in semantics.

Coderx7 · 2023-02-23T07:04:57Z

@rwightman out of the last three conv layers, two (2048 and 256) have kernel size=1 and they use padding of 1, that causes the featuremap size to increase from previously 7x7 (after the down sampling) to 9x9(after conv 1x1), the next conv1x1 layer increases that to 11x11, thus causing the effective reduction to vary that way.
Ok no problem, please take your time and let's continue this when you are free.
I really do appreciate you taking the time despite all of your busy schedule.

Coderx7 · 2023-03-13T19:11:41Z

@rwightman May I ask if your vacation is over and if we can hopefully get this last step worked out?

rwightman · 2023-03-16T20:56:10Z

@Coderx7 been trying to get on top of my own tasks since getting back. I looked at this a bit more, not really liking the padding issue that is the reason for the expanding dim... having a padding of 1 for a 1x1 conv makes zero sense to me. It's adding data to the signal path that's not meaningful. So, I'm hesitant to add alltogether with quirks like that present...

Coderx7 · 2023-03-17T08:05:08Z

@rwightman Thanks, I really appreciate it knowing how busy your schedule is.
its not really any different than using (zero)-padding on the input.
This happened by accident, but after I noticed it, in a few experiments that I did afterward, I noticed they perform better than the no padding versions, looked to me as if it creates a kind of regularization effect.
I can run more experiments to further validate this point (or lack thereof, if that happens to be the case ultimately), if that's your concern.
my main concern is that, it takes a lot of time to train these models again, (it took me several months to train these models as I don't have access to anything powerful, just a single gpu). but I try my best to see how I can address your concerns.

rwightman · 2023-03-17T16:17:26Z

@Coderx7 in deep learning it would seem almost any extra activations (or parameters) can/will be used to improve the loss in optimization, but I'd argue not particularly useful ones (and possibly harmful for segmentation/obj detection as they'd add a 'border' effect at the feature level). They get blended back into the signal via the subsequent 3x3 conv. I did test these and per the goal of running faster, the extra padding does have a measurable speed impact (not significant but there).

The rest of the net is fine, simple as per the name which isn't bad to have in timm as they can be the best option for some tasks. If the padding issue fixed (padding == kernel_size//2 should do fine for this net) and retrained I'd definitely include with the tweaks mentioned.

Do you have hparams for these? I have two idle 2x Titan RTX machines right now, I could put them to work if you push any outstanding changes re arch to this PR.

the last two 1x1 convs now use no padding, this is done to make the architetcure in line with what timms standards. because of this change the pretrained weights are no more valid and this needs to be retrained.

These are the updated imagenet pretrained weights with improved accuracy.

Coderx7 · 2023-04-14T19:33:42Z

@rwightman Hi, hope you are doing great,
I finally finished training the new weights and I just updated the pr.
would you please kindly tell me what you think.
Thanks a lot in advance.

Coderx7 · 2023-07-25T06:44:32Z

@rwightman its been a few months since my last changes, could you kindly please tell me if everything is OK or I'm missing sth here?
Id really like to make this happen if you will of course.
Thanks a lot in advance

add simplenet architecture

8ed5ca8

-added simplenet.py to timm/models -added simplenet.md to docs/models -added an entry to docs/models.md

This comment was marked as resolved.

Sign in to view

Coderx7 force-pushed the main branch from ea9c7bb to 5ce577b Compare March 17, 2023 19:57

This comment was marked as outdated.

Sign in to view

refactor simplenet into stages and remove padding for conv1x1s

6e45fc4

the last two 1x1 convs now use no padding, this is done to make the architetcure in line with what timms standards. because of this change the pretrained weights are no more valid and this needs to be retrained.

Coderx7 force-pushed the main branch from 5ce577b to 6e45fc4 Compare March 18, 2023 05:24

Add new pretrained weights

6e84a47

These are the updated imagenet pretrained weights with improved accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add simplenet architecture #1679

add simplenet architecture #1679

Coderx7 commented Feb 16, 2023 •

edited

HuggingFaceDocBuilderDev commented Feb 16, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 17, 2023

rwightman commented Feb 17, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 17, 2023

Coderx7 commented Feb 17, 2023 •

edited

rwightman commented Feb 17, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 18, 2023

rwightman commented Feb 19, 2023

Coderx7 commented Feb 19, 2023

rwightman commented Feb 19, 2023

Coderx7 commented Feb 19, 2023 •

edited

Coderx7 commented Feb 21, 2023

rwightman commented Feb 21, 2023

Coderx7 commented Feb 21, 2023 •

edited

Coderx7 commented Feb 23, 2023

rwightman commented Feb 23, 2023

Coderx7 commented Feb 23, 2023 •

edited

Coderx7 commented Mar 13, 2023

rwightman commented Mar 16, 2023

Coderx7 commented Mar 17, 2023

rwightman commented Mar 17, 2023

This comment was marked as resolved.

This comment was marked as outdated.

Coderx7 commented Apr 14, 2023

Coderx7 commented Jul 25, 2023

add simplenet architecture #1679

Are you sure you want to change the base?

add simplenet architecture #1679

Conversation

Coderx7 commented Feb 16, 2023 • edited

HuggingFaceDocBuilderDev commented Feb 16, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 17, 2023

rwightman commented Feb 17, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 17, 2023

Coderx7 commented Feb 17, 2023 • edited

rwightman commented Feb 17, 2023

rwightman commented Feb 17, 2023

Coderx7 commented Feb 18, 2023

rwightman commented Feb 19, 2023

Coderx7 commented Feb 19, 2023

rwightman commented Feb 19, 2023

Coderx7 commented Feb 19, 2023 • edited

Coderx7 commented Feb 21, 2023

rwightman commented Feb 21, 2023

Coderx7 commented Feb 21, 2023 • edited

Coderx7 commented Feb 23, 2023

rwightman commented Feb 23, 2023

Coderx7 commented Feb 23, 2023 • edited

Coderx7 commented Mar 13, 2023

rwightman commented Mar 16, 2023

Coderx7 commented Mar 17, 2023

rwightman commented Mar 17, 2023

This comment was marked as resolved.

This comment was marked as outdated.

Coderx7 commented Apr 14, 2023

Coderx7 commented Jul 25, 2023

Coderx7 commented Feb 16, 2023 •

edited

Coderx7 commented Feb 17, 2023 •

edited

Coderx7 commented Feb 19, 2023 •

edited

Coderx7 commented Feb 21, 2023 •

edited

Coderx7 commented Feb 23, 2023 •

edited