genome

Binary neural networks have a low memory footprint and run crazy fast. Let's use that to speed up reinforcement learning.

For example, this network is 600 bytes and performs 500,000 evaluations/second on my laptop CPU:

Theory

The goal of this project is to train binary neural networks directly in a reinforcement learning environment using natural evolution strategies. Binary networks are a great fit for RL because:

Much of the cost of training comes from needing to evaluate the policy network on every timestep.
Derivative-free optimization is competitive with backpropagation-based algorithms in RL because the policy gradient is usually available only in expectation, and its Monte-Carlo approximations can be pretty bad. This means we can get away with a nondifferentiable model and a black box optimizer.
Evolution strategies are highly parallelizable, and many individual threads / cores can each run a binary neural network with ease.

Binary neural networks

The networks implemented here are a modification of XNOR-Nets. Each layer's weights, inputs, and outputs are constrained to being vectors of +/-1 values, which are encoded as binary, and use the sign function as their nonlinearity. Layers compute the function $f(x; W, b) = \text{sign}(W^Tx + b)$ .

In exchange, these networks have an extremely fast forward pass because the dot product of binary-encoded +/-1 binary vectors x and y is n_bits - popcount(x XOR y), which can be computed in just a few clock cycles. By baking the subtraction and the bias into the comparison for the sign function, we can speed up inference even more. Each activation / weight vector is stored as a uint64_t, so memory access is very fast and usage is extremely low.

Natural evolution strategies

Evolution strategies work by maintaining a search distribution over neural networks. At each step, we sample a new population of networks from the distribution, evaluate each, then update the search distribution towards the highest-performing samples. Natural evolution strategies do this by following the natural gradient to update the search distribution in the direction of highest expected reward.

To train binary networks with NES, I use a separable Bernoulli distribution, parameterized by a vector of logits. Each generation's update computes the closed-form natural gradient with respect to the bit probabilities, and then backpropagates through the sigmoid function to update the logits.

Building the project

This project has a fairly large number of heterogenous dependencies, making building somewhat of a pain. There is a Poetry build hook which should build the whole project by just running poetry install. The full build chain is:

Create device object files for the GPU with nvcc.
Transpile pyx files into C++ with Cython (which wrap the CUDA code in a Python-accessible way).
Make shared object libraries that can be imported by Python out of the newly-created cpp files.
Install any Python dependencies and perform Python module installation tasks.

Usage

Executing poetry run python genome/demo.py will run a demo that trains a small neural network to balance a pole. If you're on a graphical system, it should render episodes periodically as the model learns. It also dumps logs in outputs, which you can inspect with tensorboard --logdir=outputs to watch training metrics evolve.

Every run saves its own logs, so they require unique names. If you want to run the same command twice, you can delete the old log files under that name, or pick a new name.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
genome		genome
resources		resources
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
build.py		build.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genome

genome

resources

resources

.gitignore

.gitignore

.python-version

.python-version

LICENSE

LICENSE

README.md

README.md

build.py

build.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

test.py

test.py

Repository files navigation

genome

Theory

Binary neural networks

Natural evolution strategies

Building the project

Usage

About

Releases

Packages

Contributors 2

Languages

License

maxwells-daemons/genome

Folders and files

Latest commit

History

Repository files navigation

genome

Theory

Binary neural networks

Natural evolution strategies

Building the project

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages