Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to select some specific windows to train / valid / predict by mask (BaseWindows) #904

Open
ISPritchin opened this issue Feb 28, 2024 · 2 comments

Comments

@ISPritchin
Copy link

Description

Thanks for a wonderful product. The more I read the source code, the more impressed I am by the quality of this development.

In our problem, the model must be able to predict only some points in the time series that satisfy the conditions. We would like to have a filtering method based on which we could determine whether a given window should be used for training / validation or not.

Use case

I saw that one of the selection mechanisms is available_mask, but its use does not completely solve our problem. Masks can be more specific. May be the library already has a solution for my problem, but I couldn't find it.

I will give an example of a task where window filtering is required. In the simplest case, let us be given vector y. But we would like to be able to make a forecast not for all ('unique_id', 'ds') points, but for those that satisfy some criteria. I mean we have to select some time periods for prediction, which are specific to every individual client. The criterion can filter out a very large number of windows from training. This criterion can be calculated in advance by the user (calculated before training and provided by the user to the model).

Suggested solution:

  • I suggest to enter another reserved name for the column (like 'unique_id', 'ds', 'available_mask'). For the example below I'm using the name 'is_used'.
  • If the last value in the input_size-part of this column is 1, we select the window, otherwise not.

Example:

input_size = 3
h = 2
y       = [3, 4, 5, 6, 7, 8, 9, 10]
is_used = [0, 0, 1, 0, 1, 1, 0, 0]

Currently the following windows would be obtained:

[
    [3, 4, 5, 6, 7],
    [4, 5, 6, 7, 8],
    [5, 6, 7, 8, 9],
    [6, 7, 8, 9, 10]
]

But using the new column we would like to get:
[[3, 4, 5, 6, 7], [5, 6, 7, 8, 9], [6, 7, 8, 9, 10]

The windows were selected because is_used[input_size - 1] is 1

y = [3, 4, 5, 6, 7],  is_used = [0, 0, 1, 0, 1]
y = [5, 6, 7, 8, 9],  is_used = [1, 0, 1, 1, 0] 
y = [6, 7, 8, 9, 10], is_used = [0, 1, 1, 0, 0] 

I am convinced that the implementation of this functionality will greatly increase the capabilities of the library. I will provide you with any information you need to resolve this issue.

Thanks for your hard work. I do not rule out that someday me and my team will be able to join the contributors to your project.

@ISPritchin
Copy link
Author

Were you able to understand the idea described above? I, in fact, completed the implementation locally and, it seems, is ready for the pull request.

@ISPritchin ISPritchin changed the title Ability to select some specific windows to train by mask (BaseWindows) Ability to select some specific windows to train / valid / predict by mask (BaseWindows) Mar 5, 2024
@elephaint
Copy link
Contributor

Thanks for the suggestion, I think I understand the request. Feel free to file a PR with the request so that we can review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants