[BUG] Chronos predictions are offset by a constant value if time series has a high mean value #4132

gmanlan · 2024-04-24T00:13:33Z

Describe the bug
When using TimeSeriesPredictor on a 5min frequency TimeSeriesDataFrame, the predictions are always X points above the trained/tested/trend line. The following screenshot clearly shows the gap issue:

Expected behavior
Regardless of which model/data is used, I would expect TimeSeriesPredictor to start predicting at an appropriate value level (i.e. value close to the last available observation).

To Reproduce
This is the code straight from the Autogluon TimeSeries Tutorial (https://auto.gluon.ai/stable/tutorials/timeseries/forecasting-chronos.html), simply adapted to read a 5T/5min freq. data frame. Note that while this is specifically using "chronos_small", this is reproducible with any other kind of model (RecursiveTabular, TemporalFusionTransformer, etc.), trained or not.
The example data (G_5min_sample.csv) file is here: G_5min_sample.csv

from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
import matplotlib.pyplot as plt
data = TimeSeriesDataFrame("./G_5min_sample.csv")
data = data.convert_frequency(freq='5min')
data = data.fill_missing_values(method='auto')
prediction_length = 24
train_data, test_data = data.train_test_split(prediction_length)
predictor = TimeSeriesPredictor(prediction_length=prediction_length).fit(
    train_data, presets="chronos_small",
)
predictions = predictor.predict(train_data)
predictor.plot(
    data=data, 
    predictions=predictions, 
    item_ids=["G1"],
    max_history_length=100,
)
plt.show()

Installed Versions

INSTALLED VERSIONS ------------------ date : 2024-04-23 time : 16:53:32.388177 python : 3.10.12.final.0 OS : Linux OS-release : 5.10.16.3-microsoft-standard-WSL2 machine : x86_64 processor : x86_64 num_cores : 12 cpu_ram_mb : 6873.67578125 cuda version : 12.530.54 num_gpus : 1 gpu_ram_mb : [3967] avail_disk_size_mb : 211063

accelerate : 0.21.0
autogluon : 1.1.0
autogluon.common : 1.1.0
autogluon.core : 1.1.0
autogluon.features : 1.1.0
autogluon.multimodal : 1.1.0
autogluon.tabular : 1.1.0
autogluon.timeseries : 1.1.0
boto3 : 1.34.81
catboost : 1.2.3
defusedxml : 0.7.1
evaluate : 0.4.1
fastai : 2.7.14
gluonts : 0.14.3
hyperopt : 0.2.7
imodels : None
jinja2 : 3.1.3
joblib : 1.4.0
jsonschema : 4.21.1
lightgbm : 4.1.0
lightning : 2.1.4
matplotlib : 3.8.4
mlforecast : 0.10.0
networkx : 3.3
nlpaug : 1.1.11
nltk : 3.8.1
nptyping : 2.4.1
numpy : 1.26.4
nvidia-ml-py3 : 7.352.0
omegaconf : 2.2.3
onnxruntime-gpu : None
openmim : 0.3.9
optimum : 1.18.1
optimum-intel : None
orjson : 3.10.0
pandas : 2.1.4
pdf2image : 1.17.0
Pillow : 10.3.0
psutil : 5.9.8
pytesseract : 0.3.10
pytorch-lightning : 2.1.4
pytorch-metric-learning: 1.7.3
ray : 2.10.0
requests : 2.28.2
scikit-image : 0.20.0
scikit-learn : 1.4.0
scikit-learn-intelex : None
scipy : 1.12.0
seqeval : 1.2.2
setuptools : 60.2.0
skl2onnx : None
statsforecast : 1.4.0
tabpfn : None
tensorboard : 2.16.2
text-unidecode : 1.3
timm : 0.9.16
torch : 2.1.2
torchmetrics : 1.2.1
torchvision : 0.16.2
tqdm : 4.65.2
transformers : 4.38.2
utilsforecast : 0.0.10
vowpalwabbit : None
xgboost : 2.0.3

What I have tried so far:

Setting freq='5min' in TimeSeriesPredictor (should have the same effect as convert_frequency + fill_missing_values).
Using different 'presets' (medium_quality, high_quality), different 'models' (RecursiveTabular, TemporalFusionTransformer, Chronos, Ensemble, etc.) and different time_limits (hoping that fitting for longer would improve something).
Using larger and smaller data frames (all of which are 5min freq), some with missing values, others without missing values.

I always observe the same "level gap" at any point in time, regardless of model and data.

The only way in which I can get expected results is by using the Autogluon Tutorial Dataset ("https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_hourly_tiny/train.csv") which is freq=H, so not really helpful to debug why the 5min freq. doesn't work as expected.

Thanks!

The text was updated successfully, but these errors were encountered:

shchur · 2024-05-02T08:20:56Z

Hi @gmanlan, thank you for the detailed description of the issue! This problem looks quite similar to Figure 16b in the Chronos paper, so I suspect that this might be a limitation that is specific to the Chronos model and the scaling/quantization scheme that it uses.

One potential option to address it is to center the time series by subtracting the mean/median value for each time series, and then adding it back to the forecasts.

This is something that we will try to fix in the next version of Chronos, or at least handle automatically on the AutoGluon side in the v1.2 release.

gmanlan · 2024-05-02T21:35:12Z

Hi @shchur, thanks for the clarification. I thought about using the post-processing center fix correction you suggest, but I'm afraid it won't be precise/ideal (especially if the goal is to predict just a few - short term - steps). Thus, I wanted to make sure I wasn't missing something here. Happy to resume testing when a new version is released. Thanks!

gmanlan · 2024-05-02T22:06:48Z

It's also worth noticing that, as I mentioned in the description, the same issue happens with other models like RecursiveTabular. For example:

predictor = TimeSeriesPredictor(prediction_length=prediction_length).fit(
    train_data,
    hyperparameters={
        "RecursiveTabular": {}
    }
)

While the gap is less notorious here, it's still annoying that it doesn't align/start at the right "altitude".

gmanlan added bug: unconfirmed Something might not be working Needs Triage Issue requires Triage labels Apr 24, 2024

shchur added bug Something isn't working module: timeseries related to the timeseries module and removed bug: unconfirmed Something might not be working Needs Triage Issue requires Triage labels May 2, 2024

shchur added this to the 1.2 Release milestone May 2, 2024

shchur added the priority: 1 High priority label May 2, 2024

shchur changed the title ~~[BUG] TimeSeriesPredictor always predicts with a constant value level gap~~ [BUG] Chronos predictions are offset by a constant value if time series has a high mean value May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Chronos predictions are offset by a constant value if time series has a high mean value #4132

[BUG] Chronos predictions are offset by a constant value if time series has a high mean value #4132

gmanlan commented Apr 24, 2024

shchur commented May 2, 2024

gmanlan commented May 2, 2024

gmanlan commented May 2, 2024

[BUG] Chronos predictions are offset by a constant value if time series has a high mean value #4132

[BUG] Chronos predictions are offset by a constant value if time series has a high mean value #4132

Comments

gmanlan commented Apr 24, 2024

shchur commented May 2, 2024

gmanlan commented May 2, 2024

gmanlan commented May 2, 2024