Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

Open
LeonTing1010 opened this issue Apr 30, 2024 · 2 comments

Comments

@LeonTing1010
Copy link

What happened + What you expected to happen

(_train_tune pid=59932) /Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/integration/pytorch_lightning.py:198: ray.tune.integration.pytorch_lightning.TuneReportCallback is deprecated. Use ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback instead.
(_train_tune pid=59932) Seed set to 1
2024-05-01 01:27:11,649 ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93
Traceback (most recent call last):
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 2623, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=59932, ip=127.0.0.1, actor_id=b48464a8f9278052285d8c3c01000000, repr=_train_tune)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 330, in train
raise skipped from exception_cause(skipped)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/_internal/util.py", line 98, in run
self._ret = self._target(*self._args, **self._kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 45, in
training_func=lambda: self._trainable_func(self.config),
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 253, in _trainable_func
output = fn()
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 130, in inner
return trainable(config, **fn_kwargs)
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 209, in _train_tune
_ = self._fit_model(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 357, in _fit_model
model = model.fit(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_multivariate.py", line 537, in fit
return self._fit(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_model.py", line 218, in _fit
trainer = pl.Trainer(**model.trainer_kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 431, in init
self._callback_connector.on_trainer_init(
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 79, in on_trainer_init
_validate_callbacks_list(self.trainer.callbacks)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in _validate_callbacks_list
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden
raise ValueError("Expected a parent")
ValueError: Expected a parent

Versions / Dependencies

Name: neuralforecast
Version: 1.7.1
Summary: Time series forecasting suite using deep learning models
Home-page: https://github.com/Nixtla/neuralforecast/
Author: Nixtla
Author-email: business@nixtla.io
License: Apache Software License 2.0

Reproduction script

Y_hat_df = nf.cross_validation(df=Y_train_df,
val_size=val_size,
test_size=test_size,
n_windows=None
)

Issue Severity

High: It blocks me from completing my task.

@LeonTing1010
Copy link
Author

from neuralforecast.auto import AutoTSMixer, AutoTSMixerx
from ray.tune.search.hyperopt import HyperOptSearch
from ray import tune
from neuralforecast.losses.numpy import mse, mae
import matplotlib.pyplot as plt
import pandas as pd

from datasetsforecast.long_horizon import LongHorizon
from neuralforecast.core import NeuralForecast
from neuralforecast.models import TSMixer, TSMixerx, NHITS, MLPMultivariate, iTransformer
from neuralforecast.losses.pytorch import MSE, MAE

Change this to your own data to try the model

Y_df, X_df, _ = LongHorizon.load(directory='./', group='ETTm2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

X_df contains the exogenous features, which we add to Y_df

X_df['ds'] = pd.to_datetime(X_df['ds'])
Y_df = Y_df.merge(X_df, on=['unique_id', 'ds'], how='left')

We make validation and test splits

n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)
horizon = 96
input_size = 512

tsmixer_config = {
"input_size": input_size, # Size of input window
"max_steps": tune.choice([500, 1000, 2000]), # Number of training iterations
"val_check_steps": 100, # Compute validation every x steps
"early_stop_patience_steps": 5, # Early stopping steps
"learning_rate": tune.loguniform(1e-4, 1e-2), # Initial Learning rate
"n_block": tune.choice([1, 2, 4, 6, 8]), # Number of mixing layers
"dropout": tune.uniform(0.0, 0.99), # Dropout
"ff_dim": tune.choice([32, 64, 128]), # Dimension of the feature linear layer
"scaler_type": 'identity',
}

tsmixerx_config = tsmixer_config.copy()
tsmixerx_config['futr_exog_list'] = ['ex_1', 'ex_2', 'ex_3', 'ex_4']
modelx = AutoTSMixerx(h=horizon,
n_series=7,
loss=MAE(),
config=tsmixerx_config,
num_samples=10,
search_alg=HyperOptSearch(),
backend='ray',
valid_loss=MAE())

nf = NeuralForecast(models=[modelx], freq='15min')
Y_hat_df = nf.cross_validation(df=Y_df, val_size=val_size,
test_size=test_size, n_windows=None)
print(nf.models[0].results.get_best_result().config)
y_true = Y_hat_df.y.values
y_hat_tsmixerx = Y_hat_df['AutoTSMixerx'].values

print(f'MAE TSMixerx: {mae(y_hat_tsmixerx, y_true):.3f}')
print(f'MSE TSMixerx: {mse(y_hat_tsmixerx, y_true):.3f}')

@elephaint
Copy link
Contributor

Thanks - this is weird, if I run your code it runs without any issue.

Can you give more details about the machine config (OS, Python) you are using? How are you running this script?

If I'd have to guess it's a package conflict issue - so I would create a new virtual environment, install neuralforecast in that environment, and try rerunning the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants