Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

Open
4 tasks done
MetallicaSPA opened this issue Sep 8, 2023 · 9 comments
Open
4 tasks done

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

MetallicaSPA opened this issue Sep 8, 2023 · 9 comments
Labels
openai gym related to OpenAI Gym interface question Further information is requested

Comments

@MetallicaSPA
Copy link

❓ Question

Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly?

Checklist

@MetallicaSPA MetallicaSPA added the question Further information is requested label Sep 8, 2023
@araffin
Copy link
Member

araffin commented Sep 8, 2023

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms?
Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

@araffin araffin added the openai gym related to OpenAI Gym interface label Sep 8, 2023
@MetallicaSPA
Copy link
Author

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms? Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

I tried with a DQN without any luck. I tried modifying the size of the net (policy and value) and entropy and value coefficient for the A2C algorithm. Someone in this post mentioned that a tabular Q-Learning method would be more efficient than a DQN and a A2C. I'll check the hyperparameter tuning anyway but if anyone can point me to the right direction would be great. Thanks in advance.

@araffin
Copy link
Member

araffin commented Sep 9, 2023

By the way, what do you mean exactly by solving? a reward always equal to 1?

@MetallicaSPA
Copy link
Author

By the way, what do you mean exactly by solving? a reward always equal to 1?

Solving the environment equals to reaching the finish state.
By the way, I implemented the tabular Q-Learning and it can solve Frozenlake in the symbolic version I implemented (with extra rewards each step; take it as a Frozenlake with reward shaping). I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

@araffin
Copy link
Member

araffin commented Sep 10, 2023

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases?
Also the env is supposed to be deterministic, I've observed stochastic behavior...

@araffin
Copy link
Member

araffin commented Sep 10, 2023

I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

simpler doesn't mean worse, tabular q-learning is tailored for that env.

@MetallicaSPA
Copy link
Author

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases? Also the env is supposed to be deterministic, I've observed stochastic behavior...

I'm using the non deterministic version of the env (is_slippery=True), and it can solve it around 60 times out 100 aprox. With the regular Q-Learning, none. Same with A2C.

@araffin
Copy link
Member

araffin commented Sep 11, 2023

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8
CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

@MetallicaSPA
Copy link
Author

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8
CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

Thank you for your reply! I'll try it to see if I can replicate these results. Anyway I think this should be added to the RL zoo repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openai gym related to OpenAI Gym interface question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants