Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Solved) No env.reset() at the end of each training epoch. #67

Open
slDeng1003 opened this issue Mar 11, 2024 · 2 comments
Open

(Solved) No env.reset() at the end of each training epoch. #67

slDeng1003 opened this issue Mar 11, 2024 · 2 comments

Comments

@slDeng1003
Copy link

slDeng1003 commented Mar 11, 2024

Existing code:
Only reset the environment at the beginning of training loop, that is, only call env.reset() at the first epoch.
Right(might) training paradigm
I checked OpenAI spinning-up's implement of PPO https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py, they do reset the env at the end of each epoch (same as reset it at the beginning of each epoch).

Correct me if I were wrong:)

P.S.: It;s still nice code!

@ZheruiHuang
Copy link

Hello! I think the training code is logically the same as OpenAI's.

Maybe you are misled by these two similar loops: https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/ppo/ppo.py#L299 and

for t in range(1, max_ep_len+1):
In the former (OpenAI's) implementation, this loop will perform more than one episode, and it calls reset when an episode is done (but not jump out the loop). In the latter (this repo's) implementation, the loop performs only one episode. When an episode is done, it breaks the loop and resets the env (before the next episode begins).

Hope it makes scene to you!

@slDeng1003
Copy link
Author

Dear Huang,
I appreciate your reply. I have checked the code and find out that you are right.
Thank you again for your help!👍
@ZheruiHuang

@slDeng1003 slDeng1003 changed the title No env.reset() at the end of each training epoch. (Solved) No env.reset() at the end of each training epoch. Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants