You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i've recently reverse engineered an anomaly score, because we expected an alarm to trigger that did not. while inspecting elementary queries being run in our warehouse, i noticed that the detection period is part of the training period (see here). i don't think this should be default behaviour, since we don't want the test data to impact the expectations we compute from past data (i.e. training data). including the detection period effectively makes tests less sensitive and could be a bigger issue for tests with less training data. i think this could be relatively easily fixed by limiting the window function to not include the current row (happy to make PR), but i guess this more of a conceptual question. would you be open to excluding the detection period from the training period or is this intended behaviour?
The text was updated successfully, but these errors were encountered:
Hi @adrianoesch ,
Thanks for opening this issue!
This behavior was originally intentional, though I understand the confusion and I'm also not 100% sure the default behavior is the correct one, but it does require some research before we feel comfortable with changing the default.
In any case - we'll definitely be open to a PR that adds this as a configuration option (that is not yet the default).
If you are willing to contribute this I'll be happy to provide further guidance.
hi @haritamar and thanks for getting back. i'll have a look into a potential configurable solution. do you know of an exemplary PR that does sth similar? i guess i would start with putting the window limits of the anomaly_score computation into a separate macro that considers the config via elementary.get_config_var. but i guess the metric table window would also need to be adjusted, if we wanted to keep the amount of periods in the training data the same. and that seems a bit more complicated with seasonality and all. how would you go about this?
Hi @adrianoesch ,
I think maybe for simplicity maybe it's not a must to also adapt the window when this flag is set - it means that for all points in the testing period will have training data of the same size (which may be actually what you want).
Given that you can also customize the training period I think it can be good enough.
So I think possible doing a local change only in the anomaly score computation as you suggested may be good enough.
i've recently reverse engineered an anomaly score, because we expected an alarm to trigger that did not. while inspecting elementary queries being run in our warehouse, i noticed that the detection period is part of the training period (see here). i don't think this should be default behaviour, since we don't want the test data to impact the expectations we compute from past data (i.e. training data). including the detection period effectively makes tests less sensitive and could be a bigger issue for tests with less training data. i think this could be relatively easily fixed by limiting the window function to not include the current row (happy to make PR), but i guess this more of a conceptual question. would you be open to excluding the detection period from the training period or is this intended behaviour?
The text was updated successfully, but these errors were encountered: