Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detection_period propery works wrong #1523

Open
dmitrii-khr opened this issue May 10, 2024 · 4 comments
Open

detection_period propery works wrong #1523

dmitrii-khr opened this issue May 10, 2024 · 4 comments
Labels

Comments

@dmitrii-khr
Copy link

Describe the bug
I'd like to complain again about the detection_period configuration property.
The following setup

 detection_period:
                period: day
                count: 1

transforms to the following statement in the elementary query which decides whether alert or not alert:

bucket_end >= 
    dateadd(day, cast('-1' as integer), cast(max_bucket_end as timestamp))

The non-strict condition means that it will alert for 2 days. But it is expected that the alert will be only one 1 day.
Now it is not possible to configure a time series test, to alert it for the current day and not for issues that were yesterday. 0 value causes test failure.

To Reproduce
setup time-series test with bucket size 1 day and detection_period 1 day

Expected behavior
With detection period of 1 day, test will not fail if start date of failed time bucket is more than 24 hours ago.
Condition which detects if test should fail have to be strict:

bucket_end >
    dateadd(day, cast('-1' as integer), cast(max_bucket_end as timestamp))

Environment (please complete the following information):

  • dbt package Version: 0.15.1
@dmitrii-khr dmitrii-khr added Bug Something isn't working Triage 👀 labels May 10, 2024
@dmitrii-khr
Copy link
Author

wrong repo

@dmitrii-khr dmitrii-khr closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024
@dmitrii-khr dmitrii-khr reopened this May 10, 2024
@dmitrii-khr
Copy link
Author

@haritamar
Copy link
Collaborator

HI @dmitrii-khr - actually we prefer Elementary issues to be concentrated in this repo so it's good that you opened it here. I'll close the other one actually.

Currently, this is expected behavior, though I understand the confusion.
When you use a detection period of 1 day, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.
So it's actually not a bug - however daily buckets are currently always from midnight to midnight, and not the "last 24 hours".

If you'd like to make the test more real-time, you may want to consider decreasing the time bucket from daily to less than that. For example, hourly buckets can be set like this:

time_bucket:
    period: hour
    count: 1

Please let me know if this makes sense.
Thanks!

@dmitrii-khr
Copy link
Author

Hi!
Thank you for the reaction!

When you use a detection period of 1 day, we actually include the last full bucket as the detection - in that sense the test will work only on yesterday and not on today.

What I can see from logs works in other way.

Let's consider an example.
Column anomaly test. Timestamp column with time. No detection delay. Data is updated many times a day including the current day (2024-05-21).
Test considers complete buckets only. The first query determines the following boundaries for buckets:
image
Notice that today's values are not going to be in the buckets at all.

Going forward with logs and intermediate results we can get the following picture:
image
Bucket 2024-05-19 to 2024-05-20 marked as is_anomalous. It is not the last full bucket. It is not yesterday, it is 2 days ago.

It happens because of the non-strict condition in the anomaly_scores_with_is_anomalous CTE:
and bucket_end >= dateadd(day, cast('-1' as integer), cast(max_bucket_end as timestamp))

Overall test fails:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants