Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve convnext v1 amp speed. #1747

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tascj
Copy link

@tascj tascj commented Mar 28, 2023

Modification

  1. Add an option to fuse layerscale into last linear in Mlp. Less elementwise operations improves amp train/infer speed.
  2. Reshape x for Mlp, which slightly improves speed.

Together with the fast_norm option, convnext train/infer is much faster.

Benchmark

RTX3090, pt112-cu113, apex not installed

python benchmark.py --model convnext_tiny --img-size 224 --amp

conv_mlp=False

fast_norm reshape_x fast_layerscale infer_samples_per_sec train_samples_per_sec infer relative train relative
N N N 2208.06 793.61 1 1
Y N N 2485.99 858.76 1.12587 1.08209
N Y N 2320.33 806.36 1.05085 1.01607
N N Y 2381.74 867.64 1.07866 1.09328
Y Y N 2623.06 872.98 1.18795 1.10001
Y N Y 2816.98 980.99 1.27577 1.23611
N Y Y 2514.63 883.65 1.13884 1.11346
Y Y Y 2991.11 1001.68 1.35463 1.26218

conv_mlp=True

fast_norm fast_layerscale infer_samples_per_sec train_samples_per_sec infer relative train relative
N N 2249.58 793.71 1 1
Y N 2535.58 859.07 1.12713 1.08235
N Y 2430.69 869.33 1.08051 1.09527
Y Y 2875.24 982.37 1.27812 1.23769

ImageNet validation

model top1 top5
convnext_tiny 84.444 97.326
convnext_tiny + fast_layerscale 84.454 97.33

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants