Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude detection period from training period? #1491

Open
adrianoesch opened this issue Apr 16, 2024 · 3 comments
Open

Exclude detection period from training period? #1491

adrianoesch opened this issue Apr 16, 2024 · 3 comments

Comments

@adrianoesch
Copy link

i've recently reverse engineered an anomaly score, because we expected an alarm to trigger that did not. while inspecting elementary queries being run in our warehouse, i noticed that the detection period is part of the training period (see here). i don't think this should be default behaviour, since we don't want the test data to impact the expectations we compute from past data (i.e. training data). including the detection period effectively makes tests less sensitive and could be a bigger issue for tests with less training data. i think this could be relatively easily fixed by limiting the window function to not include the current row (happy to make PR), but i guess this more of a conceptual question. would you be open to excluding the detection period from the training period or is this intended behaviour?

@adrianoesch adrianoesch added Bug Something isn't working Triage 👀 labels Apr 16, 2024
@haritamar
Copy link
Collaborator

haritamar commented May 15, 2024

Hi @adrianoesch ,
Thanks for opening this issue!
This behavior was originally intentional, though I understand the confusion and I'm also not 100% sure the default behavior is the correct one, but it does require some research before we feel comfortable with changing the default.

In any case - we'll definitely be open to a PR that adds this as a configuration option (that is not yet the default).
If you are willing to contribute this I'll be happy to provide further guidance.

@adrianoesch
Copy link
Author

hi @haritamar and thanks for getting back. i'll have a look into a potential configurable solution. do you know of an exemplary PR that does sth similar? i guess i would start with putting the window limits of the anomaly_score computation into a separate macro that considers the config via elementary.get_config_var. but i guess the metric table window would also need to be adjusted, if we wanted to keep the amount of periods in the training data the same. and that seems a bit more complicated with seasonality and all. how would you go about this?

@haritamar
Copy link
Collaborator

Hi @adrianoesch ,
I think maybe for simplicity maybe it's not a must to also adapt the window when this flag is set - it means that for all points in the testing period will have training data of the same size (which may be actually what you want).
Given that you can also customize the training period I think it can be good enough.

So I think possible doing a local change only in the anomaly score computation as you suggested may be good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants