Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RL examples #463

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add RL examples #463

wants to merge 4 commits into from

Conversation

foksly
Copy link
Collaborator

@foksly foksly commented Mar 18, 2022

Current plan:

TODO:

  • make a PPO run with more than 1 peer
  • run baseline PPO on Atari games (adapt from here)
  • run hivemind optimizer with target batch size large enough to average every ~30 seconds

Later:

  • find out what caused the problem with use_local_updates + cuda
  • figure out how to use learning rate schedule (e.g. disable the default one and make use of hivemind.Optimizer(scheduler=...))

@foksly foksly changed the title Create RL RL examples Mar 18, 2022
@codecov
Copy link

codecov bot commented Mar 18, 2022

Codecov Report

Merging #463 (1eb7522) into master (712e428) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #463      +/-   ##
==========================================
- Coverage   85.80%   85.75%   -0.06%     
==========================================
  Files          80       80              
  Lines        7794     7794              
==========================================
- Hits         6688     6684       -4     
- Misses       1106     1110       +4     
Impacted Files Coverage Δ
hivemind/averaging/matchmaking.py 87.50% <0.00%> (-1.79%) ⬇️
hivemind/dht/node.py 91.44% <0.00%> (-0.24%) ⬇️
hivemind/utils/asyncio.py 100.00% <0.00%> (+0.86%) ⬆️
hivemind/dht/dht.py 91.51% <0.00%> (+1.21%) ⬆️

@borzunov borzunov changed the title RL examples Add RL examples Mar 19, 2022
@justheuristic
Copy link
Member

@foksly gentle reminder: do you still have time for the PR?

@justheuristic
Copy link
Member

justheuristic commented Jun 20, 2022

Great job!
Initial request: please trigger auto-merge (sync with master) and apply linters

black .
isort .

return exp_name


class AdamWithClipping(torch.optim.Adam):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to @mryab : we've recently merged the same clipping functionality here:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/moe/server/layers/optim.py#L48

Would you prefer if we...

  • keep everything as is, accept some code duplication?
  • extract moe.server.layers.optim to utils.optim and use it here?
  • keep wrapper in hivemind.optim and import from there?
  • insert your option here :)

Copy link
Member

@mryab mryab Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50:50 between the "keep here, accept duplication" and "move OptimizerWrapper and ClippingWrapper to hivemind.optim.wrapper" solutions, so ultimately, it's @foksly's call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utils option is also acceptable, but I'm slightly against this folder becoming too bloated. That said, it looks like a reasonable place to put such code, so any solution of these three is fine by me (as long as you don't import the wrapper from hivemind.moe)

@@ -0,0 +1,45 @@
# Training PPO with decentralized averaging

This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i believe PPO is on-policy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the same belief :)

@justheuristic
Copy link
Member

[just in case] feel free to ping me if you need any help with black / isort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Better tutorials
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

3 participants