[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

MetallicaSPA · 2023-09-08T19:25:19Z

❓ Question

Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2023-09-08T20:08:39Z

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms?
Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

MetallicaSPA · 2023-09-08T23:58:05Z

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms? Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

I tried with a DQN without any luck. I tried modifying the size of the net (policy and value) and entropy and value coefficient for the A2C algorithm. Someone in this post mentioned that a tabular Q-Learning method would be more efficient than a DQN and a A2C. I'll check the hyperparameter tuning anyway but if anyone can point me to the right direction would be great. Thanks in advance.

araffin · 2023-09-09T20:20:01Z

By the way, what do you mean exactly by solving? a reward always equal to 1?

MetallicaSPA · 2023-09-10T15:08:09Z

By the way, what do you mean exactly by solving? a reward always equal to 1?

Solving the environment equals to reaching the finish state.
By the way, I implemented the tabular Q-Learning and it can solve Frozenlake in the symbolic version I implemented (with extra rewards each step; take it as a Frozenlake with reward shaping). I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

araffin · 2023-09-10T18:50:49Z

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases?
Also the env is supposed to be deterministic, I've observed stochastic behavior...

araffin · 2023-09-10T18:52:07Z

I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

simpler doesn't mean worse, tabular q-learning is tailored for that env.

MetallicaSPA · 2023-09-10T18:54:11Z

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases? Also the env is supposed to be deterministic, I've observed stochastic behavior...

I'm using the non deterministic version of the env (is_slippery=True), and it can solve it around 60 times out 100 aprox. With the regular Q-Learning, none. Same with A2C.

araffin · 2023-09-11T05:57:35Z

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8

CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

MetallicaSPA · 2023-09-11T11:23:45Z

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8

CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

Thank you for your reply! I'll try it to see if I can replicate these results. Anyway I think this should be added to the RL zoo repo

MetallicaSPA added the question Further information is requested label Sep 8, 2023

araffin added the openai gym related to OpenAI Gym interface label Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

MetallicaSPA commented Sep 8, 2023

araffin commented Sep 8, 2023

MetallicaSPA commented Sep 8, 2023

araffin commented Sep 9, 2023

MetallicaSPA commented Sep 10, 2023

araffin commented Sep 10, 2023

araffin commented Sep 10, 2023

MetallicaSPA commented Sep 10, 2023

araffin commented Sep 11, 2023

MetallicaSPA commented Sep 11, 2023

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

Comments

MetallicaSPA commented Sep 8, 2023

❓ Question

Checklist

araffin commented Sep 8, 2023

MetallicaSPA commented Sep 8, 2023

araffin commented Sep 9, 2023

MetallicaSPA commented Sep 10, 2023

araffin commented Sep 10, 2023

araffin commented Sep 10, 2023

MetallicaSPA commented Sep 10, 2023

araffin commented Sep 11, 2023

MetallicaSPA commented Sep 11, 2023