Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce C51 training reuslts published by the paper #186

Open
Bowen-He opened this issue Oct 28, 2021 · 4 comments
Open

Reproduce C51 training reuslts published by the paper #186

Bowen-He opened this issue Oct 28, 2021 · 4 comments

Comments

@Bowen-He
Copy link

Hi, I'm trying to use dopamine to replicate the published result of C51 on breakout, which seems be around 700. However, it looks like my training trails are stuck around 400-500 after a rapid increase of rewards. I'm using the hyper parameter C51_icml.bin, and just changed the game name in the file. Would you please give me some suggestions about where might be wrong?

@mgbellemare
Copy link
Collaborator

Hi, is this the results from Figure 14 in the 2017 paper? These were evaluated with no-ops, while the Dopamine results use sticky actions (IIUC). Hope that helps.

@Bowen-He
Copy link
Author

OHHHH, thanks for you reply! Just trying to replicate the training results and investigate the performance gap between DQN and C51. I just checked the c51_icml.gin file used for training, it says "sticky_action = False" in it, so I guess the training results I have should be already based on no-ops.
I have the curves for three random seeds here. They've been trained for around 15M steps, which is still far from 200M steps as reported. But I think the curves should have a rapid increase to 500, from which they will climb to 600 gradually. The curves for now show that the agents will get to 400 and seem to reach a plateau from then on. Would you suggest me to wait for more steps to check the results?
Screenshot from 2021-10-29 13-51-34
Screenshot from 2021-10-29 13-53-11
Screenshot from 2021-10-29 13-54-19

@mgbellemare
Copy link
Collaborator

Yes, you'll have to wait for the full 200M frames. In this case, there is no no-op evaluation (it's not implemented in Dopamine). no-ops shouldn't make a noticeable difference on final score in most cases.

@Bowen-He
Copy link
Author

Bowen-He commented Nov 1, 2021

OK, let me finish up all the training steps to see the final scores! Would you mind if we keep this issue open cause I think it's gonna take some time to finish the training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants