Adaptive-Gradient-Clipping

This repository provides a minimal implementation of adaptive gradient clipping (AGC) (as proposed in High-Performance Large-Scale Image Recognition Without Normalization¹) in TensorFlow 2. The paper attributes AGC as a crucial component in order to train deep neural networks without batch normalization². Readers are encouraged to consult the paper to understand why one might want to train networks without batch normalization given its paramount success.

My goal with this repository is to be able to quickly train shallow networks with and without AGC. Therefore, I provide two Colab Notebooks which I discuss below.

About the notebooks

AGC.ipynb: Demonstrates training of a shallow network (only 0.002117 million parameters) with AGC.
BatchNorm.ipynb: Demonstrates training of a shallow network (only 0.002309 million parameters) with batch normalization.

Both of these notebooks are end-to-end executable on Google Colab. Furthermore, they utilize the free TPUs (TPUv2-8) Google Colab provides allowing readers to experiment very quickly.

Findings

Before moving to the findings, please be aware of the following things:

The network I have used in order to demonstrate the results is extremely shallow.
The network is a mini VGG³ style network whereas the original paper focuses on ResNet⁴ style architectures.
The dataset (flowers dataset) I experimented with consists of ~3500 samples.
I clipped gradients of all the layers whereas in the original paper final linear layer wasn't clipped (refer to Section 4.1 of the original paper).

By comparing the training progress of two networks (trained with and without AGC), we see that with AGC network training is more stabilized.

Batch Normalization	AGC

In the table below, I summarize results of the two aforementioned notebooks -

	Number of Parameters (million)	Final Validation Accuracy (%)	Training Time (seconds)
Batch Normalization	0.002309	54.67	2.7209
Adaptive Gradient Clipping	0.002117	52	2.6145

For these experiments, I used a batch size of 512 each batch having a shape of (512, 96, 96, 3) and a clipping factor of 0.01 (applicable only for AGC).

These results SHOULD NOT be treated as conclusive. For details related to training configuration (i.e. network depth, learning rate, etc.) please refer to the notebooks.

Citations

[1] Brock, Andrew, et al. “High-Performance Large-Scale Image Recognition Without Normalization.” ArXiv:2102.06171 [Cs, Stat], Feb. 2021. arXiv.org, http://arxiv.org/abs/2102.06171.

[2] Ioffe, Sergey, and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ArXiv:1502.03167 [Cs], Mar. 2015. arXiv.org, http://arxiv.org/abs/1502.03167.

[3] Simonyan, Karen, and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” ArXiv:1409.1556 [Cs], Apr. 2015. arXiv.org, http://arxiv.org/abs/1409.1556.

[4] He, Kaiming, et al. “Deep Residual Learning for Image Recognition.” ArXiv:1512.03385 [Cs], Dec. 2015. arXiv.org, http://arxiv.org/abs/1512.03385.

Acknowledgements

I referred to the following resources during experimentation:

Original JAX implementation of AGC provided by the authors.
Ross Wightman's implementation og AGC.
Fast and Lean Data Science materials provided by GCP.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
AGC.ipynb		AGC.ipynb
BatchNorm.ipynb		BatchNorm.ipynb
LICENSE		LICENSE
README.md		README.md
agc.py		agc.py
dataloader.py		dataloader.py
models.py		models.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

AGC.ipynb

AGC.ipynb

BatchNorm.ipynb

BatchNorm.ipynb

LICENSE

LICENSE

README.md

README.md

agc.py

agc.py

dataloader.py

dataloader.py

models.py

models.py

utils.py

utils.py

Repository files navigation

Adaptive-Gradient-Clipping

About the notebooks

Findings

Citations

Acknowledgements

About

Releases

Packages

Languages

License

sayakpaul/Adaptive-Gradient-Clipping

Folders and files

Latest commit

History

Repository files navigation

Adaptive-Gradient-Clipping

About the notebooks

Findings

Citations

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages