Light Implementation on Model Compression

Duke BME-590-02 Biomedical Machine Learning Final Project
A practice to understand the effect of model compression under keras/tensorflow environment
Original paper: DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING click here
In BME_Project_3_Model_Compression.ipynb click here, 3-stage model compression pipeline is implemented
1. pruning
2. quantization
3. huffman coding

Experiemental Setting

DenseNet100-12 from scratch

number of convolutional layers: 100
growth rate = 12
used on Cifar10, Cifar100 and SVHN datasets click here to see densenet official git repo
implemented using keras
train, save and load pretrained weights densenet_100_12_pretrained.h5

Results

Results on direct pruning on the pretrained model all checkpoints are saved here

Results on iterative pruning on the pretrained model all checkpoints are saved here

Results on model compression rate after applying iterative pruning and quantization on the pretrained model

Highlights

The results show that iteraitve pruning helps to slim the model while keep the model performance. For example, the pruned model with 70% sparsity can achieve 68% accuracy. The hyperparameters (epochs, learning rate, etc.) in the experiment are set the same for all sparsity levels. However, one can always tune those hyperparameters based on the sparsity level accordingly.

Direct pruning hurts the model performance but if retraining (i.e., fine-tuning) could be applied afterwards (only update the non-zero weights), the model performance evaluated in accuracy here will recover according to the paper.

As there is concern that the pruned model might have difficulty recovering from direct pruning, iterative pruning is commonly applied in real cases. Here tensorflow provides us with a ready-to-go package Magnitude-based weight pruning with Keras (click here) so that we can use directly to practice iterative pruning but with the caution that it is only compatible with tf 1.x version.

Quantization preserves the model performance using a technique called weight sharing. Weight sharing is implemented by K-means algorithm using sklearn package. Huffman coding will not affect the model performance and it is coded for each layer in the .ipynb file.

Thanks

Duke ECE 590-10 Yiran Chen and Hai Li Lecture 13 slides on Model Compression
Duke BME 590-02 Ouwen Huang and Xiling Shen

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
direct pruning checkpoints		direct pruning checkpoints
iterative pruning checkpoints		iterative pruning checkpoints
results		results
BME_Project_3_Model_Compression.ipynb		BME_Project_3_Model_Compression.ipynb
LICENSE		LICENSE
README.md		README.md
densenet_100_12_pretrained.h5		densenet_100_12_pretrained.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct pruning checkpoints

direct pruning checkpoints

iterative pruning checkpoints

iterative pruning checkpoints

results

results

BME_Project_3_Model_Compression.ipynb

BME_Project_3_Model_Compression.ipynb

LICENSE

LICENSE

README.md

README.md

densenet_100_12_pretrained.h5

densenet_100_12_pretrained.h5

Repository files navigation

Light Implementation on Model Compression

Experiemental Setting

Results

Highlights

Thanks

About

Releases

Packages

Languages

License

MyWhiteCastle/BME-590-Project3

Folders and files

Latest commit

History

Repository files navigation

Light Implementation on Model Compression

Experiemental Setting

Results

Highlights

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Languages