Reproduce C51 training reuslts published by the paper #186

Bowen-He · 2021-10-28T20:24:53Z

Hi, I'm trying to use dopamine to replicate the published result of C51 on breakout, which seems be around 700. However, it looks like my training trails are stuck around 400-500 after a rapid increase of rewards. I'm using the hyper parameter C51_icml.bin, and just changed the game name in the file. Would you please give me some suggestions about where might be wrong?

mgbellemare · 2021-10-29T18:27:29Z

Hi, is this the results from Figure 14 in the 2017 paper? These were evaluated with no-ops, while the Dopamine results use sticky actions (IIUC). Hope that helps.

Bowen-He · 2021-10-29T19:28:44Z

OHHHH, thanks for you reply! Just trying to replicate the training results and investigate the performance gap between DQN and C51. I just checked the c51_icml.gin file used for training, it says "sticky_action = False" in it, so I guess the training results I have should be already based on no-ops.
I have the curves for three random seeds here. They've been trained for around 15M steps, which is still far from 200M steps as reported. But I think the curves should have a rapid increase to 500, from which they will climb to 600 gradually. The curves for now show that the agents will get to 400 and seem to reach a plateau from then on. Would you suggest me to wait for more steps to check the results?

mgbellemare · 2021-10-30T15:59:46Z

Yes, you'll have to wait for the full 200M frames. In this case, there is no no-op evaluation (it's not implemented in Dopamine). no-ops shouldn't make a noticeable difference on final score in most cases.

Bowen-He · 2021-11-01T01:46:24Z

OK, let me finish up all the training steps to see the final scores! Would you mind if we keep this issue open cause I think it's gonna take some time to finish the training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce C51 training reuslts published by the paper #186

Reproduce C51 training reuslts published by the paper #186

Bowen-He commented Oct 28, 2021

mgbellemare commented Oct 29, 2021

Bowen-He commented Oct 29, 2021

mgbellemare commented Oct 30, 2021

Bowen-He commented Nov 1, 2021

Reproduce C51 training reuslts published by the paper #186

Reproduce C51 training reuslts published by the paper #186

Comments

Bowen-He commented Oct 28, 2021

mgbellemare commented Oct 29, 2021

Bowen-He commented Oct 29, 2021

mgbellemare commented Oct 30, 2021

Bowen-He commented Nov 1, 2021