Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value of Epsilon Decay Period #201

Open
rfali opened this issue Nov 11, 2022 · 3 comments
Open

Value of Epsilon Decay Period #201

rfali opened this issue Nov 11, 2022 · 3 comments

Comments

@rfali
Copy link

rfali commented Nov 11, 2022

In the TF version of DQN, the value of epsilon_decay_period is set to 1M steps (see here), and for Rainbow, the value is set to 250k steps (see here).

However, the Rainbow paper says they anneal to 4M frames (i.e. 1M steps) for DQN (as done in Dopamine above), and importantly without Noisy Nets (which is the case with TF Rainbow), they anneal in the first 250K frames (and not steps, which would be 62500 steps with standard frame skipping of 4).

Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?). Thank you for your time.

Screenshot of page 4 of Rainbow paper
image

@rfali
Copy link
Author

rfali commented Nov 11, 2022

Also, for the JAX Full Rainbow agent (which has Noisy Nets), and when using Noisy Nets, epsilon greedy is disabled (as in paper snippet above, as well as some other implementations like Kaixhin Rainbow here and here). However, I still see the epsilon_train set to 0.01 in JAX Full Rainbow (here) and if Noisy is true, the identity_epsilon function is called which just returns the epsilon value (but doesn't uses 0).

@psc-g
Copy link
Collaborator

psc-g commented Nov 28, 2022

thank you for pointing this out! this has been fixed here: ed92c57

@rfali
Copy link
Author

rfali commented Nov 29, 2022

Thanks! As for

Is there a Is there a discrepancy here (Rainbow should anneal within 62k steps and not 250k steps), or am I misunderstanding something (or perhaps it really doesn't matter?)

Should the epsilon_decay_period value for TF Rainbow (which does not use Noisy Nets) be 250k frames as in the Rainbow paper (which makes it 62500 steps with frame_skip=4) or 250k steps (as in current implementation) or perhaps it does not matter)? I have rarely seen a value as low as 62500 steps for epsilon decay, for example RLlib also uses 200k for its DQN variant and epislon greedy exploration is off when using Noisy Nets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants