New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy Sample Weights #1175
Comments
good idea, adding to the backlog :) (and contributions are welcome!) |
I would definitely like to contribute! |
Hi @Beerstabr, after checking, I think it should already be possible to do something like this: my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(n_lags)]) because all the |
Hi @hrzn, yes that's true. That's how I am currently doing it. However, it gets slightly more complicated when you start using lags, an output_chunk_length >1 or start training on multiple series (and there's probably other things to consider as well). For example, when you use 8 lags your series gets cut short by 8 data points. In that case I think it should be: my_model = RegressionModel(..., lags=n_lags, input_chunk_length=in_len)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(in_len - n_lags)]) And, if you want both lags and output_chunk_length > 1, then I believe it should be: my_model = RegressionModel(..., lags=n_lags, output_chunk_length=out_len, input_chunk_length=in_len)
my_model.fit(..., sample_weight=[1. / (in_len - i) for i in range(in_len - np.max([n_lags, out_len])]) And finally, when you're training on multiple series and these series differ in length, it gets a bit more complicated. It that case you'll need to take into account the order and the difference in length of the series. For example, in the case of exponentially decaying weights it could be like this: # function for calculating exponential weights
def exponential_sample_weights(ts, n_lags=8, multiple_series=False, max_series_length=np.nan):
if not multiple_series:
T = len(ts) - n_lags
sample_weights = [-np.log(1-t/T)/(T-1) for t in range(1,T+1) if t<T] + [np.log(T)/(T-1)]
else:
T = max_series_length - n_lags
T_self = len(ts) - n_lags
sample_weights = [-np.log(1-t/T)/(T-1) for t in range(1 +(T-T_self),T+1) if t<T] + [np.log(T)/(T-1)]
return sample_weights
# create a list with the weights in the same order as the series to which they belong
seq_sample_weights = []
max_len = np.max([len(series) for series in seq_series])
for series in seq_series:
seq_sample_weights_sample_weights += exponential_sample_weights(ts=series,
multiple_series=True,
max_series_length=max_len)
# fit the model (without a specific input_chunk_length)
my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight=seq_sample_weights) In the latter case you have to be very mindful of the fact that if series differ in length and you train on multiple series, then when you calculate exponentially decaying weights T should be the same for all series if you want to put equal weight on the series. So, if you want to apply sample weights and use Darts, currently it requires of you that you know what happens behind the scenes. Otherwise it's hard to get it going in the non-trivial cases and it's easy to make mistakes. Therefore I think it would be nice to have an easier way of doing it like: my_model = RegressionModel(..., lags=n_lags)
my_model.fit(..., sample_weight_type='exponential') Later on you could also add functionality to let the model focus more on specific series. But I would say that's of lesser importance. |
Hi @Beerstabr , first off, I'm sorry because I realised I made a mistake in my previous message - the I think it can be done and it could be a pretty nice feature indeed. However it would also add a bit of complexity, because it would be strongly coupled to the tabularization logic. Nevertheless, if you feel like tackling it, we would be very happy to receive a PR in this direction. However, I would recommend that you wait a little before you start, as we have another couple of initiatives ongoing that are touching the tabularization itself, so it'd be better to do it afterwards to avoid conflicts. |
Hi @hrzn, Seems to me like a fun challenge to tackle. I’ll wait for the right moment though. How will I know the ongoing initiatives will be done? Are their specific backlog items I can follow? |
Hi @Beerstabr, The PR refactoring the taburalization has been merged. If you're still interested in implementing this feature, it's more than welcome! |
Definitely! It’s really just scratching my own itch, because I would love
to use the feature myself.
Op do 23 mrt. 2023 om 15:36 schreef madtoinou ***@***.***>
… Hi @Beerstabr <https://github.com/Beerstabr>,
The PR refactoring the taburalization has been merged. If you're still
interested in implementing this feature, it's more than welcome!
—
Reply to this email directly, view it on GitHub
<#1175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZ26ZSN62CQJPORUZFHE24DW5RNVJANCNFSM6AAAAAAQADQQGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi! I would highly appreciate this feature as well. I currently pass sample weights to the fit method the following way:
|
This would be a big addition. Weights would also allow alternative was to handle missing values; e.g., https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html |
Hi! There is an idea to make the weights part of the |
There is an upcoming PR that will offer the possibility to either generate weights during tabularization or provide them as a I am not sure that adding it as an attribute of |
Sounds interesting. My motivation was to make the sample weights part of the input and use them as |
The slicing logic will be hidden, but in the tabularization. The upcoming implementation allows to associate a weight with each timestamp, which is then converted to samples weights. I don't see how weighting could be performed on the component dimensions, would you mind describing how this can be leveraged? |
It will be interesting to see the code of the new logic. Very simple example: |
I will make sure that the PR implementing this new feature will be linked to this PR. I think that this kind of "bias" should come from the loss/objective function, it's not really possible to influence a model to favor the optimization of one target component over another using another mechanism (at least to my knowledge). The model is usually responsible for identifying the most informative features (lags/components). |
My vision of the sample weights was similar to the weights passed to |
Often in forecasting it makes sense to use sample weights that make your model focus more on the recent history. And with most Sklearn models you can introduce this through the fit method. It would be great if Darts could make it easy to implement sensible weighting schemes for forecasting such as an exponentially decaying weighting function.
Many thanks for the library!
The text was updated successfully, but these errors were encountered: