Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AX seems to get stuck with Ray #2364

Closed
Balandat opened this issue Apr 13, 2024 · 2 comments
Closed

AX seems to get stuck with Ray #2364

Balandat opened this issue Apr 13, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Balandat
Copy link
Contributor

Discussed in https://github.com/facebook/Ax/discussions/2341

Originally posted by zhqrbitee April 9, 2024
Hi,
I'm hitting a weird situation that sometime AX seems to get stuck (e.g. pending for 2 days) and AX is printing out tons of message like below.

[INFO 04-08 23:04:18] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:19] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:20] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:21] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:22] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.

Meanwhile, Ray is complaining not get new trails. I'm a bit confused here, as even if AX skip doing model fitting, it should still be able to tell Ray the next point to sample.

2024-04-08 23:04:23,794	WARNING insufficient_resources_manager.py:163 -- Ignore this message if the cluster is autoscaling. No trial is running and no new trial has been started within the last 60 seconds. This could be due to the cluster not having enough resources available. You asked for 1.0 CPUs and 0 GPUs per trial, but the cluster only has 48.0 CPUs and 0 GPUs available. Stop the tuning and adjust the required resources (e.g. via the `ScalingConfig` or `resources_per_trial`, or `num_workers` for rllib), or add more resources to your cluster.

I'm using AX 3.7.0 with Ray 2.8.0. Sadly, I cannot reproduce this with simple example, but any suggestion would be appreciated.

@Cesar-Cardoso Cesar-Cardoso self-assigned this Apr 22, 2024
@Cesar-Cardoso
Copy link
Contributor

This should be resolved after #2318. Can you try 0.4.0 and see if the issue persists?

@Cesar-Cardoso Cesar-Cardoso added the bug Something isn't working label May 23, 2024
@zhqrbitee
Copy link

zhqrbitee commented May 31, 2024

@Cesar-Cardoso sorry for the delay. It seems fixed, so you can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants