Skip to content

Compute-efficient reinforcement learning with binary neural networks and evolution strategies.

License

Notifications You must be signed in to change notification settings

maxwells-daemons/genome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

genome

Read the writeup on my website!

Binary neural networks have a low memory footprint and run crazy fast. Let's use that to speed up reinforcement learning.

For example, this network is 600 bytes and performs 500,000 evaluations/second on my laptop CPU:

Binary Cartpole

Theory

The goal of this project is to train binary neural networks directly in a reinforcement learning environment using natural evolution strategies. Binary networks are a great fit for RL because:

Binary neural networks

The networks implemented here are a modification of XNOR-Nets. Each layer's weights, inputs, and outputs are constrained to being vectors of +/-1 values, which are encoded as binary, and use the sign function as their nonlinearity. Layers compute the function f(x; W, b) = \text{sign}(W^Tx + b).

In exchange, these networks have an extremely fast forward pass because the dot product of binary-encoded +/-1 binary vectors x and y is n_bits - popcount(x XOR y), which can be computed in just a few clock cycles. By baking the subtraction and the bias into the comparison for the sign function, we can speed up inference even more. Each activation / weight vector is stored as a uint64_t, so memory access is very fast and usage is extremely low.

Natural evolution strategies

Evolution strategies work by maintaining a search distribution over neural networks. At each step, we sample a new population of networks from the distribution, evaluate each, then update the search distribution towards the highest-performing samples. Natural evolution strategies do this by following the natural gradient to update the search distribution in the direction of highest expected reward.

To train binary networks with NES, I use a separable Bernoulli distribution, parameterized by a vector of logits. Each generation's update computes the closed-form natural gradient with respect to the bit probabilities, and then backpropagates through the sigmoid function to update the logits.

Building the project

This project has a fairly large number of heterogenous dependencies, making building somewhat of a pain. There is a Poetry build hook which should build the whole project by just running poetry install. The full build chain is:

  • Create device object files for the GPU with nvcc.
  • Transpile pyx files into C++ with Cython (which wrap the CUDA code in a Python-accessible way).
  • Make shared object libraries that can be imported by Python out of the newly-created cpp files.
  • Install any Python dependencies and perform Python module installation tasks.

Usage

Executing poetry run python genome/demo.py will run a demo that trains a small neural network to balance a pole. If you're on a graphical system, it should render episodes periodically as the model learns. It also dumps logs in outputs, which you can inspect with tensorboard --logdir=outputs to watch training metrics evolve.

Every run saves its own logs, so they require unique names. If you want to run the same command twice, you can delete the old log files under that name, or pick a new name.

About

Compute-efficient reinforcement learning with binary neural networks and evolution strategies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published