Exclude detection period from training period? #1491

adrianoesch · 2024-04-16T08:54:12Z

i've recently reverse engineered an anomaly score, because we expected an alarm to trigger that did not. while inspecting elementary queries being run in our warehouse, i noticed that the detection period is part of the training period (see here). i don't think this should be default behaviour, since we don't want the test data to impact the expectations we compute from past data (i.e. training data). including the detection period effectively makes tests less sensitive and could be a bigger issue for tests with less training data. i think this could be relatively easily fixed by limiting the window function to not include the current row (happy to make PR), but i guess this more of a conceptual question. would you be open to excluding the detection period from the training period or is this intended behaviour?

haritamar · 2024-05-15T13:42:37Z

Hi @adrianoesch ,
Thanks for opening this issue!
This behavior was originally intentional, though I understand the confusion and I'm also not 100% sure the default behavior is the correct one, but it does require some research before we feel comfortable with changing the default.

In any case - we'll definitely be open to a PR that adds this as a configuration option (that is not yet the default).
If you are willing to contribute this I'll be happy to provide further guidance.

adrianoesch · 2024-05-15T15:43:10Z

hi @haritamar and thanks for getting back. i'll have a look into a potential configurable solution. do you know of an exemplary PR that does sth similar? i guess i would start with putting the window limits of the anomaly_score computation into a separate macro that considers the config via elementary.get_config_var. but i guess the metric table window would also need to be adjusted, if we wanted to keep the amount of periods in the training data the same. and that seems a bit more complicated with seasonality and all. how would you go about this?

haritamar · 2024-05-19T13:18:31Z

Hi @adrianoesch ,
I think maybe for simplicity maybe it's not a must to also adapt the window when this flag is set - it means that for all points in the testing period will have training data of the same size (which may be actually what you want).
Given that you can also customize the training period I think it can be good enough.

So I think possible doing a local change only in the anomaly score computation as you suggested may be good enough.

adrianoesch added Bug Something isn't working Triage 👀 labels Apr 16, 2024

haritamar added Enhancement New feature or request Open to contribution 🧡 Awaiting Response Anomaly Detection and removed Bug Something isn't working Triage 👀 labels May 15, 2024

github-actions bot added Triage 👀 and removed Awaiting Response labels May 15, 2024

haritamar added Awaiting Response and removed Triage 👀 labels May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude detection period from training period? #1491

Exclude detection period from training period? #1491

adrianoesch commented Apr 16, 2024

haritamar commented May 15, 2024 •

edited

adrianoesch commented May 15, 2024

haritamar commented May 19, 2024

Exclude detection period from training period? #1491

Exclude detection period from training period? #1491

Comments

adrianoesch commented Apr 16, 2024

haritamar commented May 15, 2024 • edited

adrianoesch commented May 15, 2024

haritamar commented May 19, 2024

haritamar commented May 15, 2024 •

edited