Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to perform temporal embedding in DARTS? #2348

Open
guimalo opened this issue Apr 23, 2024 · 2 comments
Open

[Question] How to perform temporal embedding in DARTS? #2348

guimalo opened this issue Apr 23, 2024 · 2 comments
Labels
question Further information is requested

Comments

@guimalo
Copy link

guimalo commented Apr 23, 2024

I want to understand how can I use date information as features for my machine learning model on DARTS. I want to create new information from dates and use them as columns on a regression-based forecaster. I'm quite confused by the covariates terminology and it isn't really clear what is happening under the hood on darts when using add_encoders.

In my case, I do not have any exogenous variables (past/future covariates), I just want to try to capture seasonality using temporal embedding features, like sine and cosine, for training and also for inference. For example, let's say that I have the following data:

Date Sales
2024-04-01 100
2024-04-08 150
2024-04-15 200
2024-04-22 180
2024-04-29 220
2024-05-06 250
2024-05-13 280
2024-05-20 300
2024-05-27 320
2024-06-03 350
2024-06-10 380
2024-06-17 400

How do I get from this series and create features like 'year', 'month_of_year', 'week_of_year', 'day_of_year', 'month_of_quarter', 'week_of_quarter', 'day_of_quarter', 'week_of_month' for training and inference? Is there an easy way to do this on DARTS?

I'm talking here about date features, but the documentation also does not make it quite clear for me how DARTS handles ML forecasting in general. The template example is as follows (here using CatBoost):

target = series['p (mbar)'][:100]
# optionally, use past observed rainfall (pretending to be unknown beyond index 100)
past_cov = series['rain (mm)'][:100]
# optionally, use future temperatures (pretending this component is a forecast)
future_cov = series['T (degC)'][:106]
# predict 6 pressure values using the 12 past values of pressure and rainfall, as well as the 6 temperature
# values corresponding to the forecasted period
model = CatBoostModel(
    lags=12,
    lags_past_covariates=12,
    lags_future_covariates=[0,1,2,3,4,5],
    output_chunk_length=6
)
model.fit(target, past_covariates=past_cov, future_covariates=future_cov)
pred = model.predict(6)

What does it mean to use the 12 past values of pressure and rainfall? What about the 88 other data points? How does the library actually do the calculations for the data?

Thank you.

@guimalo guimalo added bug Something isn't working triage Issue waiting for triaging labels Apr 23, 2024
@guimalo
Copy link
Author

guimalo commented Apr 23, 2024

Also, I have questions of how to create a pipeline where I deseasonalize and detrend my data, make the desired forecasts, and add back these transformations to the forecasted data. Is it possible to do that in an easy manner?

@madtoinou madtoinou added question Further information is requested and removed bug Something isn't working triage Issue waiting for triaging labels Apr 24, 2024
@madtoinou
Copy link
Collaborator

Hi @guimalo,

When you assign a value to add_encoders, the model will create the corresponding covariates "on the fly" during training/inference. In your case, since you are trying to encode information about the time axis, it can be considered as future covariates (we know in advance which day of the week/month of the year each timestamp will be at for an arbitrary number of steps). You can see them as "implicit" covariates, handled for you under the hood. If you prefer, you can of course create the encoders manually and explicitly set the covariates to the TimeSeries returned:

from darts.dataprocessing.encoders.encoders import FutureCyclicEncoder
from darts.models import CatBoostModel
from darts.utils.timeseries_generation import sine_timeseries
from pandas import Timestamp

model = CatBoostModel(
    lags=[-5, -3, -1],
    output_chunk_length=2,
    lags_future_covariates=[-2, 0, 2])

encoder = FutureCyclicEncoder(
    attribute="month",
    input_chunk_length = abs(min(model._get_lags("target"))),
    output_chunk_length = model.output_chunk_length,
    lags_covariates = model._get_lags("future"),
    )

ts_target = tg.sine_timeseries(length=100, start=Timestamp("01-01-2000"))

axis_encoding = encoder.encode_train_inference(
    n=5,
    target=ts_target
)

model.fit(ts_target, future_covariates=axis_encoding)
model.predict(5)

You can create Pipeline, and if your transforms are invertible, you can transform your forecast back to the original range : example for the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants