Easy Sample Weights #1175

Beerstabr · 2022-08-30T07:26:44Z

Often in forecasting it makes sense to use sample weights that make your model focus more on the recent history. And with most Sklearn models you can introduce this through the fit method. It would be great if Darts could make it easy to implement sensible weighting schemes for forecasting such as an exponentially decaying weighting function.

Many thanks for the library!

hrzn · 2022-08-31T13:23:04Z

good idea, adding to the backlog :) (and contributions are welcome!)

Beerstabr · 2022-09-01T07:07:34Z

I would definitely like to contribute!

Beerstabr · 2022-09-06T07:18:11Z

I was thinking of solving it much like the _create_lagged_data function from RegressionModel class.

Starting out with three options:

equal weights
linearly decaying weights
exponentially decaying weights like

hrzn · 2022-09-07T12:08:11Z

Hi @Beerstabr, after checking, I think it should already be possible to do something like this:

my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(n_lags)])

because all the kwargs received by fit() are passed to the underlying estimator's fit() method.

Beerstabr · 2022-09-08T10:01:17Z

Hi @hrzn, yes that's true. That's how I am currently doing it.

However, it gets slightly more complicated when you start using lags, an output_chunk_length >1 or start training on multiple series (and there's probably other things to consider as well).

For example, when you use 8 lags your series gets cut short by 8 data points. In that case I think it should be:

my_model = RegressionModel(..., lags=n_lags, input_chunk_length=in_len)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(in_len - n_lags)])

And, if you want both lags and output_chunk_length > 1, then I believe it should be:

my_model = RegressionModel(..., lags=n_lags, output_chunk_length=out_len, input_chunk_length=in_len)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(in_len - np.max([n_lags, out_len])])

And finally, when you're training on multiple series and these series differ in length, it gets a bit more complicated. It that case you'll need to take into account the order and the difference in length of the series. For example, in the case of exponentially decaying weights it could be like this:

# function for calculating exponential weights
def exponential_sample_weights(ts, n_lags=8, multiple_series=False, max_series_length=np.nan):
  
    if not multiple_series:
        T = len(ts) - n_lags
        sample_weights = [-np.log(1-t/T)/(T-1) for t in range(1,T+1) if t<T] + [np.log(T)/(T-1)]
    else:
        T = max_series_length - n_lags
        T_self = len(ts) - n_lags
        sample_weights = [-np.log(1-t/T)/(T-1) for t in range(1 +(T-T_self),T+1) if t<T] + [np.log(T)/(T-1)]
    
    return sample_weights

# create a list with the weights in the same order as the series to which they belong
seq_sample_weights = []
max_len = np.max([len(series) for series in seq_series])
for series in seq_series:
        seq_sample_weights_sample_weights += exponential_sample_weights(ts=series, 
                                                                        multiple_series=True, 
                                                                        max_series_length=max_len)

# fit the model (without a specific input_chunk_length)
my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight=seq_sample_weights)

In the latter case you have to be very mindful of the fact that if series differ in length and you train on multiple series, then when you calculate exponentially decaying weights T should be the same for all series if you want to put equal weight on the series.

So, if you want to apply sample weights and use Darts, currently it requires of you that you know what happens behind the scenes. Otherwise it's hard to get it going in the non-trivial cases and it's easy to make mistakes. Therefore I think it would be nice to have an easier way of doing it like:

my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight_type='exponential')

Later on you could also add functionality to let the model focus more on specific series. But I would say that's of lesser importance.

hrzn · 2022-09-12T09:29:10Z

Hi @Beerstabr , first off, I'm sorry because I realised I made a mistake in my previous message - the sample_weight are (obviously) per-sample weights and not per-dimension weights, as I was too quick to assume. Indeed the actual number of samples is a relatively non-trivial function of the input chunk length (or nr. of lags used on the target), the number of targets, and potentially the parameter max_samples_per_ts. Then once all samples are built (this is done in the function RegressionModel._create_lagged_data(), the weights should be assigned to them as a function of how far in the past the lag of the y column corresponds to.

I think it can be done and it could be a pretty nice feature indeed. However it would also add a bit of complexity, because it would be strongly coupled to the tabularization logic. Nevertheless, if you feel like tackling it, we would be very happy to receive a PR in this direction. However, I would recommend that you wait a little before you start, as we have another couple of initiatives ongoing that are touching the tabularization itself, so it'd be better to do it afterwards to avoid conflicts.

Beerstabr · 2022-09-13T08:06:21Z

Hi @hrzn,

Seems to me like a fun challenge to tackle. I’ll wait for the right moment though. How will I know the ongoing initiatives will be done? Are their specific backlog items I can follow?

madtoinou · 2023-03-23T14:35:53Z

Hi @Beerstabr,

The PR refactoring the taburalization has been merged. If you're still interested in implementing this feature, it's more than welcome!

Beerstabr · 2023-03-23T14:47:38Z

Definitely! It’s really just scratching my own itch, because I would love to use the feature myself. Op do 23 mrt. 2023 om 15:36 schreef madtoinou ***@***.***>

…

Hi @Beerstabr <https://github.com/Beerstabr>, The PR refactoring the taburalization has been merged. If you're still interested in implementing this feature, it's more than welcome! — Reply to this email directly, view it on GitHub <#1175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZ26ZSN62CQJPORUZFHE24DW5RNVJANCNFSM6AAAAAAQADQQGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

daniel-ressi · 2023-10-03T09:46:52Z

Hi! I would highly appreciate this feature as well. I currently pass sample weights to the fit method the following way:

create darts timeseries with sample weights (in my case a list of timeseries)
recomputing the _get_feature_times and get_shared_times from the tabularization module (very redundant)
slicing the sample weights (darts timeseries) based on the shared times
converting it into a numpy array
passing it it to sample_weight as additional keyword arguments passed to the fit method of the underlying model

gofford · 2023-10-05T10:38:11Z

This would be a big addition. Weights would also allow alternative was to handle missing values; e.g., https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html

BohdanBilonoh · 2024-04-08T19:01:21Z

Hi! There is an idea to make the weights part of the TimeSeries class as an attribute for xarray (like a static covs or a hierarchy). I could contribute if the idea is valid

madtoinou · 2024-04-09T07:23:47Z

There is an upcoming PR that will offer the possibility to either generate weights during tabularization or provide them as a TimeSeries when training the model. The logic is implemented, the contributor is now working on the tests.

I am not sure that adding it as an attribute of TimeSeries is the approach we want to take as they are immutable and one might be interested in testing several weighting approaches.

BohdanBilonoh · 2024-04-10T06:17:06Z

Sounds interesting. My motivation was to make the sample weights part of the input and use them as weight_cols for TimeSeries.from_dataframe. This could allow all slicing logic to be hidden behind the TimeSeries class and allow weight values not only per sample, but per timestamp and/or per component. Does this new logic you mentioned cover such abilities?

madtoinou · 2024-04-10T06:54:34Z

The slicing logic will be hidden, but in the tabularization.

The upcoming implementation allows to associate a weight with each timestamp, which is then converted to samples weights. I don't see how weighting could be performed on the component dimensions, would you mind describing how this can be leveraged?

BohdanBilonoh · 2024-04-10T07:03:43Z

It will be interesting to see the code of the new logic.

Very simple example:
E-commerce time series that contain revenue and margin as targets and have to be predicted simultaneously (using TiDE model) but revenue is more important that margin

madtoinou · 2024-04-10T12:06:47Z

I will make sure that the PR implementing this new feature will be linked to this PR.

I think that this kind of "bias" should come from the loss/objective function, it's not really possible to influence a model to favor the optimization of one target component over another using another mechanism (at least to my knowledge). The model is usually responsible for identifying the most informative features (lags/components).

BohdanBilonoh · 2024-04-10T14:23:04Z

My vision of the sample weights was similar to the weights passed to Likelihood.compute_loss and in this scenario sample and/or timestamp and/or component could be weighted

Beerstabr added the triage Issue waiting for triaging label Aug 30, 2022

dennisbader added feature request Use this label to request a new feature and removed triage Issue waiting for triaging labels Aug 30, 2022

hrzn added this to To do in darts via automation Aug 31, 2022

madtoinou assigned madtoinou and Beerstabr and unassigned madtoinou Mar 23, 2023

madtoinou moved this from To do to In progress in darts Mar 23, 2023

madtoinou mentioned this issue Oct 17, 2023

Custom timeseries sample weighting #2030

Closed

madtoinou mentioned this issue Dec 6, 2023

Applying sample weights to the training process #2107

Open

AntonRagot linked a pull request Apr 30, 2024 that will close this issue

Feat/ Adding sample weight #2362

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy Sample Weights #1175

Easy Sample Weights #1175

Beerstabr commented Aug 30, 2022

hrzn commented Aug 31, 2022

Beerstabr commented Sep 1, 2022

Beerstabr commented Sep 6, 2022

hrzn commented Sep 7, 2022 •

edited

Beerstabr commented Sep 8, 2022 •

edited

hrzn commented Sep 12, 2022

Beerstabr commented Sep 13, 2022

madtoinou commented Mar 23, 2023

Beerstabr commented Mar 23, 2023 via email

daniel-ressi commented Oct 3, 2023 •

edited

gofford commented Oct 5, 2023

BohdanBilonoh commented Apr 8, 2024 •

edited

madtoinou commented Apr 9, 2024

BohdanBilonoh commented Apr 10, 2024 •

edited

madtoinou commented Apr 10, 2024

BohdanBilonoh commented Apr 10, 2024 •

edited

madtoinou commented Apr 10, 2024

BohdanBilonoh commented Apr 10, 2024 •

edited

Easy Sample Weights #1175

Easy Sample Weights #1175

Comments

Beerstabr commented Aug 30, 2022

hrzn commented Aug 31, 2022

Beerstabr commented Sep 1, 2022

Beerstabr commented Sep 6, 2022

hrzn commented Sep 7, 2022 • edited

Beerstabr commented Sep 8, 2022 • edited

hrzn commented Sep 12, 2022

Beerstabr commented Sep 13, 2022

madtoinou commented Mar 23, 2023

Beerstabr commented Mar 23, 2023 via email

daniel-ressi commented Oct 3, 2023 • edited

gofford commented Oct 5, 2023

BohdanBilonoh commented Apr 8, 2024 • edited

madtoinou commented Apr 9, 2024

BohdanBilonoh commented Apr 10, 2024 • edited

madtoinou commented Apr 10, 2024

BohdanBilonoh commented Apr 10, 2024 • edited

madtoinou commented Apr 10, 2024

BohdanBilonoh commented Apr 10, 2024 • edited

hrzn commented Sep 7, 2022 •

edited

Beerstabr commented Sep 8, 2022 •

edited

daniel-ressi commented Oct 3, 2023 •

edited

BohdanBilonoh commented Apr 8, 2024 •

edited

BohdanBilonoh commented Apr 10, 2024 •

edited

BohdanBilonoh commented Apr 10, 2024 •

edited

BohdanBilonoh commented Apr 10, 2024 •

edited