Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support merge_update_columns logic for unit_tests #10115

Open
2 tasks done
rburke45 opened this issue May 8, 2024 · 2 comments
Open
2 tasks done

support merge_update_columns logic for unit_tests #10115

rburke45 opened this issue May 8, 2024 · 2 comments
Labels
enhancement New feature or request incremental Incremental modeling with dbt unit tests Issues related to built-in dbt unit testing functionality

Comments

@rburke45
Copy link

rburke45 commented May 8, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a model with set merge_update_columns is unit tested all fields are updated, not just those listed in merge_update_columns.

Expected Behavior

When a model with set merge_update_columns is unit tested only the fields listed in merge_update_columns should be updated.

Steps To Reproduce

Minimal example:
table.sql

{{
    config(
        materialized='table'
    )
}}

SELECT
  1 AS col1,
  1 AS col2,
  8 AS col3
UNION ALL
SELECT
  2 AS col1,
  1 AS col2,
  9 AS col3
UNION ALL
SELECT
  3 AS col1,
  1 AS col2,
  5 AS col3

update.sql

{{
    config(
        materialized='incremental',
        unique_key='col1',
        merge_update_columns=['col2'],
    )
}}

SELECT
  col1,
  col2,
  col3
FROM
  {{ref('table')}}
-- this filter will only be applied on an incremental run
{% if is_incremental() %}
WHERE   
  col2 > (SELECT MAX(col2) FROM {{ this }})
{% endif %}

unit_test.yml

unit_tests:
  - name: merge_update_test
    model: update
    overrides:
      macros:
        # unit test this model in "incremental" mode
        is_incremental: true 
    given:
      - input: ref('table')
        rows:
          - {col1: 1, col2: 2, col3: 3}
          - {col1: 2, col2: 3, col3: 4}

      - input: this
        rows:
          - {col1: 1, col2: 1, col3: 8}
          - {col1: 2, col2: 1, col3: 9}
          - {col1: 3, col2: 1, col3: 5}
    expect:
      rows:
        - {col1: 1, col2: 2, col3: 8}
        - {col1: 2, col2: 3, col3: 9}

Running this unit test gives the log output below. Calling dbt run to replicate the same data produces the expected behavior.

Relevant log output

actual differs from expected:

@@ ,col1,col2,col3
→  ,1   ,2   ,8→3
→  ,2   ,3   ,9→4

Environment

- OS: macOS 14.4.1 (23E224)
- Python: Python 3.12.2
- dbt-core: 1.8.0b1

Which database adapter are you using with dbt?

No response

Additional Context

I'm still relatively new to DBT, so please let me know if there's additional information I can provide.

@rburke45 rburke45 added bug Something isn't working triage labels May 8, 2024
@dbeatty10 dbeatty10 added unit tests Issues related to built-in dbt unit testing functionality incremental Incremental modeling with dbt labels May 9, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2024

@rburke45 Thanks for opening the issue (and for trying out dbt unit testing)!

In my opinion, this is a fair and reasonable limitation to unit tests and the merge_update_columns / merge_update_exclude_columns configurations. When is_incremental: true, the unit test is not validating the actual final result of the incremental update — it's validating the set of new rows & columns that the model's SQL is returning, which will then be applied within the materialization. (For this same reason, your unit test expect doesn't include the col1: 3 row, either, even though that row would remain in {{ this }} after the incremental update.)

I believe that an important principle of dbt is the separation of "transformation logic" and "materialization logic." As a general rule, dbt models should apply all transformation logic within their select statements, and the goal of the materialization should be to take the exact returned dataset and apply it to an object in the database (whether via simple create/replace or more complex merge/delete+insert). The merge_update_columns goes against this principle by saying, "There's one more piece of transformation logic to apply, during the materialization itself."

While that makes it more ergonomic to apply this trickier logic, it also makes it more difficult to preview or test the exact returned dataset before it's applied. Effectively, you can test this already in dbt, with a mocked (seed) input, a full model build, and an equality data test against the final output — but this is testing more than just the "unit" of your model's select statement.

Here's another way of accomplishing the same result — "on incremental runs, update col2 only" — but this time with that logic baked into the select statement itself:

{{
    config(
        materialized='incremental',
        unique_key='col1',
    )
}}

select
  {{ 'old.col1' if is_incremental() else 'new.col1' }},
  new.col2,
  {{ 'old.col3' if is_incremental() else 'new.col3' }},
from
  {{ref('table')}} as new
-- this filter will only be applied on an incremental run
{% if is_incremental() %}
left join {{ this }} as old
  on new.col1 = old.col1
where new.col2 > (select max(col2) from {{ this }})
{% endif %}

This SQL is trickier to read, but I do believe it's simpler to test, and more explicit as to what's happening: on incremental runs, col2 gets updated, but col1 + col3 keep their old values. This model definition passes the unit test in your example when I run it locally. You can also preview the exact results you're going to get (dbt show -s update) before you modify the existing table.

@jtcohen6 jtcohen6 removed the triage label May 9, 2024
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2024

@MichelleArk has alerted me to the fact that this issue is quite similar to another one we'd opened:

This is blocked on introducing a different strategy for unit tests, where all input fixtures + expected + actual are materialized in the data warehouse:

@graciegoheen graciegoheen added enhancement New feature or request and removed bug Something isn't working labels May 13, 2024
@graciegoheen graciegoheen changed the title [Bug] merge_update_columns logic isn't supported in unit_tests support merge_update_columns logic for unit_tests May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request incremental Incremental modeling with dbt unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

4 participants