Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hyperparameter TiDE and TFT no complete trials #2363

Closed
flight505 opened this issue May 1, 2024 · 3 comments
Closed

hyperparameter TiDE and TFT no complete trials #2363

flight505 opened this issue May 1, 2024 · 3 comments
Labels
question Further information is requested

Comments

@flight505
Copy link

Hi I dont think this is a bug, but I cant trace the error.
It might be due to the data it self as it is multivariate and grouped by ID.
score is returning [nan, nan, nan, nan...] leading to trial = study.best_trial
the val_loss=120 so it is very high, but i think it should still return validation score.
It might be the model is not correctly using all data and simply a series?

def objective(trial):
    # Suggest values for the hyperparameters
    decoder_output_dim = trial.suggest_int("decoder_output_dim", 15, 45, step=10)
    hidden_size = trial.suggest_categorical("hidden_size", [64, 128, 256, 512])
    dropout = trial.suggest_float("dropout", 0.05, 0.55, step=0.05)

    # Define early stopping criteria
    early_stop_callback = EarlyStopping(
        monitor="val_loss", min_delta=1e-2, patience=5, verbose=True, mode="min"
    )

    # Learning rate scheduler
    lr_scheduler_cls = torch.optim.lr_scheduler.ExponentialLR
    lr_scheduler_kwargs = {
        "gamma": 0.999,
    }

    # Update model args with suggested values
    model_args = {
        "input_chunk_length": 45,
        "output_chunk_length": 5,
        "decoder_output_dim": decoder_output_dim,
        "hidden_size": hidden_size,
        "dropout": dropout,
        "random_state": 42,
        "use_static_covariates": True,
        "use_reversible_instance_norm": True,
        "lr_scheduler_cls": lr_scheduler_cls,
        "lr_scheduler_kwargs": lr_scheduler_kwargs,
        "pl_trainer_kwargs": {
            "gradient_clip_val": 1,
            "accelerator": "auto",
            "max_epochs": 60,
            "callbacks": [early_stop_callback],
        },
    }

    # Create and fit the model
    model = TiDEModel(**model_args)
    model.fit(series=train_scaled, val_series=val_scaled, verbose=True)

    # Evaluate the model
    val_pred = model.predict(n=len(val_scaled[0]), series=val_scaled, verbose=True)
    
    # Inverse transform the predictions and actual values
    val_pred_inverse = scaler.inverse_transform(val_pred)
    val_actual_inverse = scaler.inverse_transform(val_scaled)

    score = mse(val_actual_inverse, val_pred_inverse)
    print(f"Validation MSE: {score}")
    return score


# Create a study object and optimize the objective function
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=30)
print("Number of trials:", len(study.trials))

try:
    print("Best trial:")
    trial = study.best_trial

    print("  Value:", trial.value)
    print("  Params:")
    for key, value in trial.params.items():
        print(f"    {key}: {value}")
except ValueError:
    print("No trials are completed yet.")

TimeSeries is created with .from_dataframe not as .from_group_dataframe as it was creating a few errors.

# Create a list of unique patient IDs
unique_ids = processed_df["unique_id"].unique()

# Get 40 unique IDs for testing
# unique_ids = unique_ids[:40]

# Split the unique IDs into training, validation, and test sets
train_ids, val_test_ids = train_test_split(unique_ids, test_size=0.2, random_state=42)
val_ids, test_ids = train_test_split(val_test_ids, test_size=0.5, random_state=42)

# Create TimeSeries objects for training, validation, and test sets
train_list = []
val_list = []
test_list = []

for unique_id in unique_ids:
    patient_data = processed_df[processed_df["unique_id"] == unique_id]
    patient_static_covariates = patient_data[["sex", "age"]].iloc[
        0
    ]  # Get the first row for the patient's static covariates
    patient_series = TimeSeries.from_dataframe(
        patient_data,
        time_col=None,  # Use the DataFrame's index as the time index
        value_cols=component_names,
        static_covariates=patient_static_covariates,
        freq=1,  # Specify the frequency as an integer value
    )

    if unique_id in train_ids:
        train_list.append(patient_series)
    elif unique_id in val_ids:
        val_list.append(patient_series)
    else:
        test_list.append(patient_series)

# Convert the TimeSeries objects to float32
train_list = [ts.astype(np.float32) for ts in train_list]
val_list = [ts.astype(np.float32) for ts in val_list]
test_list = [ts.astype(np.float32) for ts in test_list]

print(f"Length of train_list: {len(train_list)}")
print(f"Length of val_list: {len(val_list)}")
print(f"Length of test_list: {len(test_list)}")
@madtoinou
Copy link
Collaborator

Hi @flight505,

In general, NaN in the forecasts and scores are caused by missing values in the TimeSeries. Can you verify if any of your series contain such values? Checking at the dataframe level is not sufficient as new timestamp will be added by Darts if they are missing from the index.

You can also check the FAQ section of the documentation.

@madtoinou madtoinou added the question Further information is requested label May 3, 2024
@flight505
Copy link
Author

flight505 commented May 3, 2024

Hey @madtoinou, thank you for reaching out. It appears that during the evaluation, there were issues with the varying length of the patient series. I included a custom SMAPE and conducted an evaluation over the patient series. However, the validation loss is quite high. Ideally, I intended for the model to train across patients rather than just individual series. The data is both multivariate and includes patient IDs as groups, which I assume also makes it multi-series. I am unsure about the exact approach to handle this. I was using TFT with pytorch-forecasting, which was a bit easier to work with, but it seems that it is no longer maintained. Any insights you could provide would be greatly appreciated.
This is the current code:

# Scale the series
scaler = Scaler()
train_scaled = scaler.fit_transform(train_list)
val_scaled = scaler.transform(val_list)

def custom_smape(actual, forecast):
    min_length = min(len(actual), len(forecast))
    actual = actual[:min_length]
    forecast = forecast[:min_length]
    numerator = np.abs(forecast - actual)
    denominator = np.abs(actual) + np.abs(forecast)
    ratio = numerator / denominator
    return 2 * np.mean(ratio)

# print some optimization trials information
def print_callback(study, trial):
    print(f"Current value: {trial.value}, Current params: {trial.params}")
    print(f"Best value: {study.best_value}, Best params: {study.best_trial.params}")

# define objective function
def objective(trial):

    # Suggest values for the hyperparameters
    decoder_output_dim = trial.suggest_int("decoder_output_dim", 16, 64, step=10)
    hidden_size = trial.suggest_categorical("hidden_size", [32, 64, 128, 256, 512, 1024])
    dropout = trial.suggest_float("dropout", 0.1, 0.7, step=0.1)
    output_chunk_shift = trial.suggest_int("output_chunk_shift", 1, 5)
    num_encoder_layers = trial.suggest_int("num_encoder_layers", 1, 3)
    num_decoder_layers = trial.suggest_int("num_decoder_layers", 1, 3)

    pruner = PyTorchLightningPruningCallback(trial, monitor="val_loss")
    early_stopper = EarlyStopping("val_loss", min_delta=0.001, patience=3, mode="min", verbose=True)
    callback = [pruner, early_stopper]

    # Model and trainer configurations
    optimizer_kwargs = {"lr": 1e-5}
    pl_trainer_kwargs = {
        "gradient_clip_val": 0.1,
        "max_epochs": 200,
        "accelerator": "auto",
        "callbacks": callback,
    }
    lr_scheduler_cls = torch.optim.lr_scheduler.ExponentialLR
    lr_scheduler_kwargs = {"gamma": 0.999}

    # Common model arguments
    common_model_args = {
        "input_chunk_length": 40,
        "output_chunk_length": 10,
        "decoder_output_dim": decoder_output_dim,
        "output_chunk_shift": output_chunk_shift,
        "num_encoder_layers": num_encoder_layers,
        "num_decoder_layers": num_decoder_layers,
        "hidden_size": hidden_size,
        "dropout": dropout,
        "optimizer_kwargs": optimizer_kwargs,
        "pl_trainer_kwargs": pl_trainer_kwargs,
        "lr_scheduler_cls": lr_scheduler_cls,
        "lr_scheduler_kwargs": lr_scheduler_kwargs,
        "likelihood": None,
        "save_checkpoints": True,
        "force_reset": True,
        "random_state": 42,
    }

    # Instantiate and fit the model
    model = TiDEModel(**common_model_args, use_reversible_instance_norm=True, use_static_covariates=True)
    model.fit(series=train_scaled, val_series=val_scaled, verbose=True)

    # Predict and handle multiple series
    val_preds = model.predict(n=10, series=val_scaled, verbose=True)
    scores = []

    for idx, val_pred in enumerate(val_preds):
        if val_pred.pd_dataframe().isna().any().any():
            return float('inf')  # Assign a high penalty for NaN predictions

        val_pred_unscaled = scaler.inverse_transform(val_pred)
        actual = val_scaled[idx]
        score = custom_smape(actual.pd_dataframe().values, val_pred_unscaled.pd_dataframe().values)
        scores.append(score)

    # Calculate average SMAPE across all predictions
    average_score = np.mean(scores)
    if np.isnan(average_score):
        return float('inf')  # Assign a high penalty for NaN average score
    
    return average_score  # Use average SMAPE as the optimization objective


# Create a study object and optimize the objective function
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=40, callbacks=[print_callback])


print("Number of trials:", len(study.trials))
try:
    print("Best trial:")
    trial = study.best_trial
    print("  Value:", trial.value)
    print("  Params:")
    for key, value in trial.params.items():
        print(f"    {key}: {value}")
except ValueError:
    print("No trials are completed yet.")

@dennisbader
Copy link
Collaborator

dennisbader commented May 13, 2024

Hi @flight505, I haven't checked your entire code (not really a minimum reproducible example :) ) but here are some things that are causing the large errors:

Inverse transformation of val_preds: If train_list is a list of series, then scaler.fit_transform(train_list) will fit a dedicated Scaler per series.
So when you inverse transform, you need to give a list with the same order as train_list to properly transform the data.

E.g.:

val_preds = model.predict(n=10, series=val_scaled, verbose=True)
val_preds = scaler.inverse_transform(val_preds)

Forecasts: model_predict(..., series=val_scaled, ...) will generate a forecast per series that start after the end of the input series. When you compute the SMAPE between (val_scaled, val_preds), there will be 0 overlapping time steps. This doesn't error out because you slice the series with:

actual = actual[:min_length]
forecast = forecast[:min_length]

resulting in actual and forecast from completely different time steps (here we see a lot of pitfalls that can happen when working with time series data. If you use Darts, we handle this behind the hood for you).

If val_list is the future of train_list, then you would have to call model_predict(..., series=train_scaled, ...) for this to work.

Also for your metric to work properly (and fast!), try to implement it similar to any Darts metric (e.g. here, including decorators, and call to _get_values_or_raise())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants