You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When solving the problem of predicting performance for musicians, problems arose with training high-quality models.
About data and preprocessing
The source data contains about 30 auxiliary time series and about 10 static features for each item_id, all NaNs in the dataset are filled, zeros in the target are filled with various types of distributions (the nature of the data requires filling zeros) and the target is further smoothed.
This is what the target looks like after processing:
Training
For each data preprocessing option, I’ve tried to train the PatchTST model, but the results were always very bad (hundreds of MAPEs), and the models always showed approximately the same picture (almost linear trend and random fluctuations around it).
For clarity, I’ve trained PatchTST only with target as input (filling zeros from the normal distribution and exponential smoothing with alpha 0.5). The result looked like this:
With a full entrance the picture is similar. There is no need to talk about normal metrics.
An ensemble trained using the presets="high_quality" setting shows these frightening results:
Additional info
Adding to the problem is the inability to look at the error curve for train in order to understand how the training is progressing. Also worrying is that the PatchTST model (which has attention layers) takes less than an hour to train on an Nvidia V100 video card (1.5 hours when choosing from 30 hyperparameter variations) and consumes about 400MB of video memory. In this connection, there is a suspicion of underfitting.
=================== System Info ===================
AutoGluon Version: 1.1.0
Python Version: 3.8.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024
CPU Count: 8
GPU Count: 1
Memory Avail: 25.62 GB / 47.04 GB (54.5%)
Disk Space Avail: 94.61 GB / 295.20 GB (32.1%)
===================================================
Fitting with arguments:
{'enable_ensemble': False,
'eval_metric': MAPE,
'freq': 'D',
'hyperparameter_tune_kwargs': {'num_trials': 30,
'scheduler': 'local',
'searcher': 'random'},
'hyperparameters': {'PatchTST': {'context_length': Categorical[64, 96, 128],
'patch_len': Categorical[8, 16, 32],
'stride': Categorical[4, 8, 16]}},
'known_covariates_names': [],
'num_val_windows': 1,
'prediction_length': 365,
'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
'random_seed': 123,
'refit_every_n_windows': 1,
'refit_full': False,
'skip_model_selection': False,
'target': 'daily_streams',
'verbosity': 2}
Provided train_data has 6520892 rows, 3834 time series. Median time series length is 1965 (min=462, max=2436).
Removing 259 short time series from train_data. Only series with length >= 731 will be used for training.
After filtering, train_data has 6366137 rows, 3575 time series. Median time series length is 1965 (min=731, max=2436).
... and train data info:
Items count: 3834
Min group len: 462
Max group len: 2436
Mean group len: 1700.8064684402711
Median group len: 1965.0
Zeroes in target 81.40272833839298% (they are filled)
Honor and praise to those who can help understand this situation. I will be happy to provide additional materials on the case upon request.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello!
The problem
When solving the problem of predicting performance for musicians, problems arose with training high-quality models.
About data and preprocessing
The source data contains about 30 auxiliary time series and about 10 static features for each item_id, all NaNs in the dataset are filled, zeros in the target are filled with various types of distributions (the nature of the data requires filling zeros) and the target is further smoothed.
This is what the target looks like after processing:
Training
For each data preprocessing option, I’ve tried to train the PatchTST model, but the results were always very bad (hundreds of MAPEs), and the models always showed approximately the same picture (almost linear trend and random fluctuations around it).
For clarity, I’ve trained PatchTST only with target as input (filling zeros from the normal distribution and exponential smoothing with alpha 0.5). The result looked like this:
With a full entrance the picture is similar. There is no need to talk about normal metrics.
An ensemble trained using the presets="high_quality" setting shows these frightening results:
Additional info
Adding to the problem is the inability to look at the error curve for train in order to understand how the training is progressing. Also worrying is that the PatchTST model (which has attention layers) takes less than an hour to train on an Nvidia V100 video card (1.5 hours when choosing from 30 hyperparameter variations) and consumes about 400MB of video memory. In this connection, there is a suspicion of underfitting.
Here is the training pipeline:
... AutoGluon train preparation output:
... and train data info:
Honor and praise to those who can help understand this situation. I will be happy to provide additional materials on the case upon request.
Beta Was this translation helpful? Give feedback.
All reactions