Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags #1565

Open
MrGG14 opened this issue May 11, 2024 · 0 comments

Comments

@MrGG14
Copy link

MrGG14 commented May 11, 2024

Im working on making predicctions for the next 24 hours of the energy price. The problem is i dont undestand how the datasets are created. I want to predict the next 24 values, i dont care if it´s just by using the previous 24 hours or more. I just dont know how to configure it.

All help is aprecciated. This is my dataframe

fechaHora precio_spot demanda co2 precio_gas prod_eolica prod_solar demanda_residual rampa month week index day
0 2022-08-31 00:00:00 227.82 26649.83 80.28 204.32 6061.25 232.17 21193.03 1750.55 0 0 0 0
1 2022-08-31 01:00:00 195.00 25480.42 80.28 204.32 5636.58 145.58 19582.33 1610.70 0 0 1 0
2 2022-08-31 02:00:00 184.95 24435.00 80.28 204.32 4902.33 113.83 18842.22 740.10 0 0 2 0
3 2022-08-31 03:00:00 181.69 23810.17 80.28 204.32 4227.33 62.58 19194.65 -352.43 0 0 3 0
4 2022-08-31 04:00:00 181.79 23520.92 80.28 204.32 3933.25 18.33 19861.95 -667.30 0 0 4 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
13676 2024-03-22 19:00:00 57.30 29866.00 59.65 26.43 5725.08 257.67 23398.45 -3844.93 19 82 13676 569
13677 2024-03-22 20:00:00 53.92 30719.25 59.65 26.43 6607.83 80.17 24598.68 -1200.23 19 82 13677 569
13678 2024-03-22 21:00:00 35.00 29757.92 59.65 26.43 7931.58 64.00 22840.10 1758.58 19 82 13678 569
13679 2024-03-22 22:00:00 29.16 27183.58 59.65 26.43 9437.92 64.00 18698.83 4141.28 19 82 13679 569
13680 2024-03-22 23:00:00 15.60 24874.83 59.65 26.43 10415.50 63.00 14931.05 3767.78 19 82 13680 569

Expected behavior

Create training, validation and test data.

Actual behavior

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

Code to reproduce the problem


features = [col for col in data.columns if col != 'precio_spot' ]  # Columnas de características and col != 'fechaHora'

max_prediction_length = 24
max_encoder_length = 24 #48
# training_cutoff = data["fechaHora"].max() - pd.Timedelta(hours=max_encoder_length)

training = TimeSeriesDataSet(
    train_data,
    time_idx="index",
    target="precio_spot",
    group_ids=["day"],
    min_encoder_length=24,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=24,
    max_prediction_length=max_prediction_length,
    static_categoricals=[],
    static_reals=[],
    time_varying_known_categoricals=[], # group of categorical variables can be treated as one variable
    time_varying_known_reals=features,
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[],
    target_normalizer=GroupNormalizer(
        groups=["day"], transformation="softplus"
    ),
  
   # use softplus and normalize by group
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
    categorical_encoders={
        'month':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
        'week':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
        'day':pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
        
    },
)

validation = TimeSeriesDataSet.from_dataset(training, val_data, predict=True, stop_randomization=True)
test = TimeSeriesDataSet.from_dataset(training, test_data, predict=True, stop_randomization=True)

# create dataloaders for model
batch_size = 64  # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=11, persistent_workers=True)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=11, persistent_workers=True)
test_dataloader = test.to_dataloader(train=False, batch_size=batch_size, num_workers=11, persistent_workers=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant