Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slimming Resnet #2

Open
hiyijian opened this issue Sep 21, 2017 · 10 comments
Open

Slimming Resnet #2

hiyijian opened this issue Sep 21, 2017 · 10 comments

Comments

@hiyijian
Copy link

hiyijian commented Sep 21, 2017

Dear @liuzhuang13,
I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right?
So I can not figure out how to slim residual block using your method.
image
The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image
Almost the same situation in shortcut version. How do you handle this?

Thanks

@liuzhuang13
Copy link
Owner

liuzhuang13 commented Sep 22, 2017

In our models, the residual branch is BN-RELU-CONV-BN-RELU-CONV-BN-RELU-CONV.

In the addition, all features from the identity mapping and the last CONV in residual branch are kept. So the main branch has the original widths of ResNets. The pruning only happens in layers inside residual branch.

Inside each residual branch:

  1. In the first BN layer, if we detect very small scaling parameters, we mask corresponding channels out, before the first BN layer, by a channel selection layer (Actually this channel selection causes a time overhead, thus I don't recommend to do it in practice).

  2. The last CONV output the same number of channels as the main branch (there's no BN to do selection).

  3. For other intermediate layers, the pruning is the same as in plain network (e.g., VGG).

If your residual branch is different from ours, you may need to modify the pruning process. But the key point is that the main branch doesn't get slimmed, the pruning is only inside residual branch. How you prune in the residual branch depends on how you order your BN and CONV layers.

@hiyijian
Copy link
Author

Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how?
Thanks

@liuzhuang13
Copy link
Owner

What I mean by "main branch" is the identity shortcut throughout the network, so there are no BN layers in main branch. Whenever there is an BN, we can do channel pruning or selection according to its scaling parameters. Thanks!

@youngfly11
Copy link

hi, @liuzhuang13 , can you release the code about DenseNet-slimming? Thank you

@liuzhuang13
Copy link
Owner

Hi @youngfly11, thanks for your interests. DenseNet's code is a little different than VGG's. Unfortunately I am busy with other things now, so I will probably release the code when I have time next month.

The way I implemented DenseNet slimming can save parameters and FLOPs, however, cannot bring speedup in the current Torch package. I implemented it using a channel selection layer, which leads to slower inference than a normal network, because it involves memory copy, not in-place selection.

If you just want the same speed as normal network, after training you can set low scaling factors and corresponding biases to 0, and don't do gradient update on them. It's equivalent as actually pruning the channels.

Thanks

@liuzhuang13
Copy link
Owner

In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.

@hiyijian
Copy link
Author

Thanks

@yyjabidintg
Copy link

Thanks for your wonderful work.
But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN.
Then the channels of this residual branch is different from the main branch one.
How should I handle this situation?
Thank you.

@toyal
Copy link

toyal commented Oct 25, 2019

Thanks for your wonderful work.
But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN.
Then the channels of this residual branch is different from the main branch one.
How should I handle this situation?
Thank you.

hi,have you solved this problem?i also encounter this issue.

@toyal
Copy link

toyal commented Oct 25, 2019

Dear @liuzhuang13,
I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right?
So I can not figure out how to slim residual block using your method.
image
The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image
Almost the same situation in shortcut version. How do you handle this?

Thanks

hi,how do you handle with this situation?thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants