New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] state:modified consider source column name changes as changes #10020
Comments
Hi! Can you say more about what "source column name changes" means? Are you imagining you have a yml file defining your source:
Then, you add a new column to this yml definition:
And you want the This is a bit odd because the source yml isn't the actual definition of what the schema of your source is. Even if you haven't defined a column in the source yml, but that column exists in the source table, you can still select it in your dbt code. So knowing the yml has changed, doesn't actually tell you if the underlying warehouse object has changed. dbt does not "manage" your sources (i.e. it's not in charge of the DDL that creates your source tables; it's just defining metadata to create a pointer to your source table). I view " This would be different for external tables (cc: @dataders). |
@graciegoheen Sorry for the lack of clarity. My problem was that the source column name actually changed, and people didn't edit the SQL code because they didn't know it was used. A developer changed it in source but didn't change any of the SQL code downstream. version: 2
sources:
- name: jaffle_shop
tables:
- name: orders
columns:
- name: id
- name: status to version: 2
sources:
- name: jaffle_shop
tables:
- name: orders
columns:
- name: id
- name: status_different And our CI runs, |
Thanks for sharing that example! I still believe this shouldn't be marked as In your case, I would suggest you use staging models (docs here). If a column that's being used in your code is renamed in your source, you would have to update your staging model as well:
This model would then be marked as modified, and you could see if anything else needs downstream to be updated as well with a CI run using |
I'm 100% using staging models. The problem is that they didn't change them. I agree if they would have, it would have worked. But they didn't and this didn't catch it. If you have any workarounds, I'd love to hear them. |
Here's how I'm thinking about it. There's two types of changes that could break you dbt models:
The focus of CI is for validating that the latter doesn't break anything in your project. Updating your source yml file doesn't actually change any of the underlying SQL, it's a "documentation" change only. The types of changes that are selected by
Perhaps you could add a test to your source (like expect_table_columns_to_match_set from dbt-expectations package), so that the test would fail if you have schema drift of your sources. Then you could add a step to your CI job to first test all of your sources:
Though changes to warehouse objects that dbt depends on may happen at any time, so it would technically be unrelated to the PR you've opened. |
I wouldn't expect description to matter obviously. And if I add a test to source it still wouldn't fix the problem because they'd just fix that test and still not propagate changes to the downstream models. I could add a comment reminding people but that seems antithetical to the purpose of the pipeline. By your argument NOTHING that affects the source would affect the sql model so nothing should be in. But in the code it states
if quoting changes are changes, why would column names not be? Seems like it's quite parallel in thought process. Freshness has nothing to do with if the dbt sql runs. So I'd disagree with your assertion. |
Quoting does change the SQL that's being executed, because it impacts how the source macro will be resolved. So when you reference a source in a model:
That will resolve differently based on how you have configured quoting:
Similarly, freshness can impact the SQL that's run when you execute the
that filter will be added to the freshness query:
|
fair enough. Thanks for at least looking at it. |
Is this your first time submitting a feature request?
Describe the feature
Currently it doesn't appear that changes to source column names is detected.
I'm suggesting we detect at least if the column names change.
We could just add a
same_columns
method in ParsedSourceDefinitionDescribe alternatives you've considered
If more than name is required, might want to add a
self.columns.same_contents
method similar toself.config.same_contents
.Who will this benefit?
Maintainers trusting their CI Pipelines (maybe too much) when source columns change out from under them.
Are you interested in contributing this feature?
Yes
Anything else?
Slack Conversation
The text was updated successfully, but these errors were encountered: