[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540

tuomijal · 2023-02-06T10:26:43Z

Problem description
Currently, Scaler transforms input series "globally", meaning that all values of the input vector are considered:

from darts.dataprocessing.transformers import Scaler
from darts import TimeSeries

scaler = Scaler()

s = [1, 2, 3, 4, 5, 6, 7, 8, 9]
s = TimeSeries.from_series(s)

s_norm = scaler.fit_transform(s)
s_norm.pd_series()

0    0.000
1    0.125
2    0.250
3    0.375
4    0.500
5    0.625
6    0.750
7    0.875
8    1.000

This is not a problem if data is manually split into train and test sets and scaler is fitted with training set.

However, if we go on to use this vector as an input to historical forecasts, we risk introducing look ahead bias into analysis (at least when performing normalization). To eliminate this bias, rolling window approach is sometimes used https://arxiv.org/abs/1907.09452.

Describe proposed solution
One solution would be to add parameters to scaler like so:

class Scaler(InvertibleDataTransformer, FittableDataTransformer):
    def __init__(
        self, scaler=None, type="Global", window=None, name="Scaler", n_jobs: int = 1, verbose: bool = False
    ):
   """
   Parameters
   ----------
   type
        The type to scale the data. Options:
        "Global" uses all data points 
        "Rolling" uses rolling window with window size specified by 'window' parameter
        "Expanding" uses expanding window with initial window size specified by 'window' parameter
    window
        Size of window if type is either "Rolling" or "Expanding"
    """

Describe potential alternatives
Another option is to integrate this functionality to historical_forecasts and backtest functions. This might be convenient because parameters above could be inferred by the desired backtesting setup.

Additional context
Thank you again for excellent software!

dennisbader · 2023-02-10T10:19:13Z

Another way would be to allow users to pass some (of our existing) transformers for the target and covariates to historical_forecasts.
Then we could simply refit transform on each train eval split.

hrzn · 2023-02-10T11:27:26Z

Thanks for raising this excellent point @tuomijal. I personally like the solution of @dennisbader as it would remove the need for users to use the windowing exactly right - they wouldn't have to worry about it, only specify which kind of scaling they want when calling historical forecasts.

tuomijal · 2023-02-10T12:16:11Z

I agree, the solution proposed by @dennisbader is the most elegant one 👍🏼

JanFidor · 2023-03-31T19:51:48Z

Hi @dennisbader @madtoinou ! The PR looks cool, could I pick it up?

madtoinou · 2023-04-04T11:54:22Z

Hi @JanFidor,

historical_forecast() currently contains several bugs that are being fixed, and will undergo a considerable refactoring after the next release. I would recommend waiting for these changes to be merged before working on this very interesting feature.

JanFidor · 2023-04-04T14:56:15Z

Sure things, If there's a chance to avoid major merge conflicts, I'll happily take it. I'll keep my eyes peeled for the new release!

madtoinou · 2023-08-02T08:40:20Z

@JanFidor

Refactoring of historical forecasts has just been merged on the main branch, if you still have time to work on this, you can go ahead!

The logic is now found in two different places: ForecastingModel.historical_forecasts and RegressionModel._optimized_historical_forecasts, make sure to implement this in both. If you need, you can implement this feature in utils/historical_forecasts/utils.py and call it in the two methods.

JanFidor · 2023-08-11T18:14:14Z

Hi @madtoinou, thanks for reminding me, this issue totally slipped my mind! Small heads up, I might have slightly less time going forward, but I'll happily give it a go. I already browsed the RegressionModel._optimized_historical_forecasts method and noticed that it's only used with pretrained models (I think that in this case data leakage would have to be prevented outside of historical_forecasts.), so I wanted to ask you whether I'm missing something or if I should just add a Scaler as an unused parameter for now.

madtoinou · 2023-08-14T06:23:44Z

Indeed, we decided to optimize this method step by step and the "retrain" logic was a bit harder to support directly.

I think that the main source of data leakage is the processing/transformation of the entire input series instead of just the part available/used for the latest historical forecast (for both retraining and/or inference). So the logic should be contained in historical_forecasts().

j-adamczyk · 2024-02-28T21:08:27Z

Any news on this? Lack of this feature means that if backcast or other more involved testing procedures are needed, Darts is quite unusable, since those transforms are really necessary (e.g. differencing, scaling).

Joseph-Foley · 2024-04-05T15:42:58Z

Yeah, I'm also quite keen on this being part of darts. I was hoping the pipeline class would function more like sklearns pipeline where transformations and models can be bundled together. Then if the pipeline class had backtest / historical_forecast we could be sure of no data leakage during backtesting.

tuomijal added the triage Issue waiting for triaging label Feb 6, 2023

dennisbader mentioned this issue Feb 10, 2023

Allow users to pass transformers to historical forecasts. #1554

Open

madtoinou added feature request Use this label to request a new feature improvement New feature or improvement and removed triage Issue waiting for triaging labels Feb 22, 2023

hrzn added this to To do in darts via automation Feb 22, 2023

hrzn removed this from To do in darts Feb 22, 2023

madtoinou mentioned this issue Apr 4, 2023

Improve RNNmodel and blockRNN #1384

Open

madtoinou mentioned this issue Jul 5, 2023

Allowing Application of Pipelines for Historical Forecasts and Backtests in Darts #1881

Closed

madtoinou mentioned this issue Sep 22, 2023

[Scaler] General procedure. [SHAP] Single timestamp forceplots. #1985

Closed

JanFidor linked a pull request Oct 6, 2023 that will close this issue

Feature/scalar with window #2021

Open

madtoinou mentioned this issue Dec 27, 2023

How to include scalers as part of the historical forecast method easily #2134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540

[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540

tuomijal commented Feb 6, 2023 •

edited

dennisbader commented Feb 10, 2023 •

edited

hrzn commented Feb 10, 2023 •

edited

tuomijal commented Feb 10, 2023

JanFidor commented Mar 31, 2023

madtoinou commented Apr 4, 2023

JanFidor commented Apr 4, 2023

madtoinou commented Aug 2, 2023

JanFidor commented Aug 11, 2023

madtoinou commented Aug 14, 2023

j-adamczyk commented Feb 28, 2024

Joseph-Foley commented Apr 5, 2024

[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540

[FEATURE] Scaler with rolling/expanding window to eliminate look ahead bias #1540

Comments

tuomijal commented Feb 6, 2023 • edited

dennisbader commented Feb 10, 2023 • edited

hrzn commented Feb 10, 2023 • edited

tuomijal commented Feb 10, 2023

JanFidor commented Mar 31, 2023

madtoinou commented Apr 4, 2023

JanFidor commented Apr 4, 2023

madtoinou commented Aug 2, 2023

JanFidor commented Aug 11, 2023

madtoinou commented Aug 14, 2023

j-adamczyk commented Feb 28, 2024

Joseph-Foley commented Apr 5, 2024

tuomijal commented Feb 6, 2023 •

edited

dennisbader commented Feb 10, 2023 •

edited

hrzn commented Feb 10, 2023 •

edited