Poor learning ability of the PatchTST model and other models on time series for music #4184

PE51K · 2024-05-08T18:07:41Z

PE51K
May 8, 2024

Hello!

The problem

When solving the problem of predicting performance for musicians, problems arose with training high-quality models.

About data and preprocessing

The source data contains about 30 auxiliary time series and about 10 static features for each item_id, all NaNs in the dataset are filled, zeros in the target are filled with various types of distributions (the nature of the data requires filling zeros) and the target is further smoothed.

This is what the target looks like after processing:

Training

For each data preprocessing option, I’ve tried to train the PatchTST model, but the results were always very bad (hundreds of MAPEs), and the models always showed approximately the same picture (almost linear trend and random fluctuations around it).

For clarity, I’ve trained PatchTST only with target as input (filling zeros from the normal distribution and exponential smoothing with alpha 0.5). The result looked like this:

With a full entrance the picture is similar. There is no need to talk about normal metrics.
An ensemble trained using the presets="high_quality" setting shows these frightening results:

Additional info

Adding to the problem is the inability to look at the error curve for train in order to understand how the training is progressing. Also worrying is that the PatchTST model (which has attention layers) takes less than an hour to train on an Nvidia V100 video card (1.5 hours when choosing from 30 hyperparameter variations) and consumes about 400MB of video memory. In this connection, there is a suspicion of underfitting.

Here is the training pipeline:

zeroes_filling_mode_options = [
    {
        "zeroes_filling_mode": "normal",
    },
    {
        "zeroes_filling_mode": "mean",
    },
    {
        "zeroes_filling_mode": "uniform",
    },
]

smoothing_method_options = [
    {
        "smoothing_method": "Exponential",
        "smoothing_alpha": 0.05
    },
    {
        "smoothing_method": "Exponential",
        "smoothing_alpha": 0.1
    },
    {
        "smoothing_method": "Exponential",
        "smoothing_alpha": 0.5
    },
    {
        "smoothing_method": "SMA",
        "smoothing_window": 21,
    },
    {
        "smoothing_method": "SMA",
        "smoothing_window": 7,
    },
    {
        "smoothing_method": None,
    },
]


output_configs_dir = f"./notebooks/autogluon-patchtst/patchtst"
prediction_length = 365
eval_metric="MAPE"
data_frequency = "D"

for i, smoothing_method_option in enumerate(smoothing_method_options):
    for j, zeroes_filling_mode_option in enumerate(zeroes_filling_mode_options):
        if (i, j) not in []:
            print("///////////////////////////////////////////////////////////////////////////////////////")
            print(f"Current settings: \n {smoothing_method_option} \n {zeroes_filling_mode_option}")
            
            train_ts_df = create_time_series_dataframe_from_pandas_df(
                df=train_df.copy(),
                static_features_df=tracks_static_features_dataframe,
                item_id_column="track_id",
                timestamp_column="date",
                target_column="daily_streams",
                **zeroes_filling_mode_option,
                **smoothing_method_option
            ) 

            
            print(f"\nSuccessfully created train dataframe. Creating predictor...")
            
            predictor = TimeSeriesPredictor(
                prediction_length=prediction_length,
                path=f"{output_configs_dir}/{i} || {j}",
                target="daily_streams",
                eval_metric=eval_metric,
                freq=data_frequency,
            )
    
            predictor.fit(
                train_data=train_ts_df,
                hyperparameters={
                    "PatchTST": {
                        "context_length": space.Categorical(64, 96, 128),
                        "patch_len": space.Categorical(8, 16, 32),
                        "stride": space.Categorical(4, 8, 16),
                    },
                },
                hyperparameter_tune_kwargs={
                    "num_trials": 30,
                    "scheduler": "local",
                    "searcher": "random",
                },
                enable_ensemble=False,
            )
            del train_ts_df
            del predictor

... AutoGluon train preparation output:

=================== System Info ===================
AutoGluon Version:  1.1.0
Python Version:     3.8.10
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #115~20.04.1-Ubuntu SMP Mon Apr 15 17:33:04 UTC 2024
CPU Count:          8
GPU Count:          1
Memory Avail:       25.62 GB / 47.04 GB (54.5%)
Disk Space Avail:   94.61 GB / 295.20 GB (32.1%)
===================================================

Fitting with arguments:
{'enable_ensemble': False,
 'eval_metric': MAPE,
 'freq': 'D',
 'hyperparameter_tune_kwargs': {'num_trials': 30,
                                'scheduler': 'local',
                                'searcher': 'random'},
 'hyperparameters': {'PatchTST': {'context_length': Categorical[64, 96, 128],
                                  'patch_len': Categorical[8, 16, 32],
                                  'stride': Categorical[4, 8, 16]}},
 'known_covariates_names': [],
 'num_val_windows': 1,
 'prediction_length': 365,
 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
 'random_seed': 123,
 'refit_every_n_windows': 1,
 'refit_full': False,
 'skip_model_selection': False,
 'target': 'daily_streams',
 'verbosity': 2}

Provided train_data has 6520892 rows, 3834 time series. Median time series length is 1965 (min=462, max=2436). 
    Removing 259 short time series from train_data. Only series with length >= 731 will be used for training.
    After filtering, train_data has 6366137 rows, 3575 time series. Median time series length is 1965 (min=731, max=2436).

... and train data info:

Items count: 3834
Min group len: 462
Max group len: 2436
Mean group len: 1700.8064684402711
Median group len: 1965.0
Zeroes in target 81.40272833839298% (they are filled)

Honor and praise to those who can help understand this situation. I will be happy to provide additional materials on the case upon request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor learning ability of the PatchTST model and other models on time series for music #4184

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Poor learning ability of the PatchTST model and other models on time series for music #4184

PE51K May 8, 2024

The problem

About data and preprocessing

Training

Additional info

Replies: 0 comments

PE51K
May 8, 2024