Binary Neural Networks

Note: This is still an ongoing project, changes and results will keep appearing as experiments are run

Binary Neural Networks

An attempt to recreate neural networks where the weights and activations are binary variables. This is a common underlying theme of the papers BinaryConnect, Binarized neural networks and XNOR-Net. The goal is to free these high performing deep learning models from the shackles of a supercomputer (read: GPUs) and bring them to edge devices which typically have a much lower memory footprint and limited computation capabilities.

NOTE TO SELF: With shallow cifar, test accuracy was 63%.

Requirements

Theano (0.9.0 or higher)
Python 2.7
numpy
tqdm (awesome progbars)
tensorboard (logging)

Idea

Regularization: Binarization of weights is a form of noise for the system. Hence just like dropout, binarization can act as a regularizer.
Discretization is a form of corruption, we can make discretization errors in different weights but the randomness of this corruption cancels out these errors.
Perform forward and backward pass using binarized form of weights, however keep a second set of weights (using full precision fp32) for gradient update. This is because SGD makes infinitesimal changes that would be lost due to binarization. For forward pass, sample a set of binary weights from full precision weights using deterministic or stochastic binarization.
Deterministic binarization: W_b = +1 if W >=0 ; else -1.
Stochastic binarization: (TODO)
Clip weights if they exceed +1/-1 as they do not contribute to the network due to the binarization.

Experiment 1: MNIST

In this experiment a baseline and binarized version of a standard MLP is trained on the MNIST dataset. In order to have a fair comparison between the two networks, the data is not augmented and dropout is applied to the baseline network to give it a fair chance against the binary network (as the binarization acts as a regularizer). The learning rate is set at a constant 0.001 and both networks are trained for 200 epochs with a batch size of 256. The training loss and validation accuracy is visualized in tensorboard using this gist. The basic architecture is as follows:

Results

1. Baseline

Test accuracy: 0.9687

2. Binarized

Test accuracy: 0.9372

Experiment 2: CIFAR 10

Results

(TODO)

Why?

This repository is the result of my curiosity to fix computation problems that arise with the deployment of (very big) neural networks. I am fascinated by recent research that has come up with effective ways to compress, prune and/or quantize deep networks so that they run in resource constrained environments like ARM chips. This is a (tiny) step in that direction.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
readme_data		readme_data
.gitignore		.gitignore
README.md		README.md
TEST.py		TEST.py
cifar_conv_baseline.py		cifar_conv_baseline.py
cifar_conv_binary.py		cifar_conv_binary.py
layers.py		layers.py
mnist.pkl.gz		mnist.pkl.gz
mnist_baseline.py		mnist_baseline.py
mnist_binary.py		mnist_binary.py
plot_history.py		plot_history.py
requirements.txt		requirements.txt
tensorboard_logging.py		tensorboard_logging.py
utils.py		utils.py

akshaychawla/Binary-Neural-Networks

Folders and files

Latest commit

History

Repository files navigation

Binary Neural Networks

Requirements

Idea

Experiment 1: MNIST

Results

Experiment 2: CIFAR 10

Results

Why?

About

Topics

Resources

Stars

Watchers

Forks

Languages