ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

LeonTing1010 · 2024-04-30T17:30:20Z

What happened + What you expected to happen

(_train_tune pid=59932) /Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/integration/pytorch_lightning.py:198: ray.tune.integration.pytorch_lightning.TuneReportCallback is deprecated. Use ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback instead.
(_train_tune pid=59932) Seed set to 1
2024-05-01 01:27:11,649 ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93
Traceback (most recent call last):
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 2623, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=59932, ip=127.0.0.1, actor_id=b48464a8f9278052285d8c3c01000000, repr=_train_tune)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 330, in train
raise skipped from exception_cause(skipped)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/_internal/util.py", line 98, in run
self._ret = self._target(*self._args, **self._kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 45, in
training_func=lambda: self._trainable_func(self.config),
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 253, in _trainable_func
output = fn()
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 130, in inner
return trainable(config, **fn_kwargs)
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 209, in _train_tune
_ = self._fit_model(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 357, in _fit_model
model = model.fit(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_multivariate.py", line 537, in fit
return self._fit(
File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_model.py", line 218, in _fit
trainer = pl.Trainer(**model.trainer_kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 431, in init
self._callback_connector.on_trainer_init(
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 79, in on_trainer_init
_validate_callbacks_list(self.trainer.callbacks)
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in _validate_callbacks_list
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden
raise ValueError("Expected a parent")
ValueError: Expected a parent

Versions / Dependencies

Name: neuralforecast
Version: 1.7.1
Summary: Time series forecasting suite using deep learning models
Home-page: https://github.com/Nixtla/neuralforecast/
Author: Nixtla
Author-email: business@nixtla.io
License: Apache Software License 2.0

Reproduction script

Y_hat_df = nf.cross_validation(df=Y_train_df,
val_size=val_size,
test_size=test_size,
n_windows=None
)

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

LeonTing1010 · 2024-04-30T17:52:42Z

from neuralforecast.auto import AutoTSMixer, AutoTSMixerx
from ray.tune.search.hyperopt import HyperOptSearch
from ray import tune
from neuralforecast.losses.numpy import mse, mae
import matplotlib.pyplot as plt
import pandas as pd

from datasetsforecast.long_horizon import LongHorizon
from neuralforecast.core import NeuralForecast
from neuralforecast.models import TSMixer, TSMixerx, NHITS, MLPMultivariate, iTransformer
from neuralforecast.losses.pytorch import MSE, MAE

Change this to your own data to try the model

Y_df, X_df, _ = LongHorizon.load(directory='./', group='ETTm2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

X_df contains the exogenous features, which we add to Y_df

X_df['ds'] = pd.to_datetime(X_df['ds'])
Y_df = Y_df.merge(X_df, on=['unique_id', 'ds'], how='left')

We make validation and test splits

n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)
horizon = 96
input_size = 512

tsmixer_config = {
"input_size": input_size, # Size of input window
"max_steps": tune.choice([500, 1000, 2000]), # Number of training iterations
"val_check_steps": 100, # Compute validation every x steps
"early_stop_patience_steps": 5, # Early stopping steps
"learning_rate": tune.loguniform(1e-4, 1e-2), # Initial Learning rate
"n_block": tune.choice([1, 2, 4, 6, 8]), # Number of mixing layers
"dropout": tune.uniform(0.0, 0.99), # Dropout
"ff_dim": tune.choice([32, 64, 128]), # Dimension of the feature linear layer
"scaler_type": 'identity',
}

tsmixerx_config = tsmixer_config.copy()
tsmixerx_config['futr_exog_list'] = ['ex_1', 'ex_2', 'ex_3', 'ex_4']
modelx = AutoTSMixerx(h=horizon,
n_series=7,
loss=MAE(),
config=tsmixerx_config,
num_samples=10,
search_alg=HyperOptSearch(),
backend='ray',
valid_loss=MAE())

nf = NeuralForecast(models=[modelx], freq='15min')
Y_hat_df = nf.cross_validation(df=Y_df, val_size=val_size,
test_size=test_size, n_windows=None)
print(nf.models[0].results.get_best_result().config)
y_true = Y_hat_df.y.values
y_hat_tsmixerx = Y_hat_df['AutoTSMixerx'].values

print(f'MAE TSMixerx: {mae(y_hat_tsmixerx, y_true):.3f}')
print(f'MSE TSMixerx: {mse(y_hat_tsmixerx, y_true):.3f}')

elephaint · 2024-05-06T17:49:42Z

Thanks - this is weird, if I run your code it runs without any issue.

Can you give more details about the machine config (OS, Python) you are using? How are you running this script?

If I'd have to guess it's a package conflict issue - so I would create a new virtual environment, install neuralforecast in that environment, and try rerunning the script.

LeonTing1010 added the bug label Apr 30, 2024

elephaint added the awaiting response label May 7, 2024

elephaint mentioned this issue May 7, 2024

TypeError: Module.load_state_dict() got an unexpected keyword argument 'assign' #986

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

LeonTing1010 commented Apr 30, 2024

LeonTing1010 commented Apr 30, 2024

elephaint commented May 6, 2024

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

Comments

LeonTing1010 commented Apr 30, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

LeonTing1010 commented Apr 30, 2024

Change this to your own data to try the model

X_df contains the exogenous features, which we add to Y_df

We make validation and test splits

elephaint commented May 6, 2024