-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance manifest structure validation and add debug logging #884
base: main
Are you sure you want to change the base?
Conversation
This commit introduces several improvements to the manifest file processing logic to ensure robustness and ease of debugging: 1. **Manifest Structure Validation**: Added checks to verify that the manifest file contains the expected top-level keys ("nodes", "sources", "exposures"). 2. **Data Type Validation**: Implemented additional validation to confirm that the "nodes" key maps to a dictionary. This step helps catch issues where the manifest's format deviates from expected structures, particularly in how nodes are represented. 3. **Key Presence Validation**: For each node within the manifest, we now verify the presence of essential keys ("resource_type", "original_file_path", "tags", "config").
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlgirdasDubickas thanks for the contribution!
Some comments:
-
Did you run into some issue that motivated this PR..?
-
Is the overall goal to check if the manifest contains fields that Cosmos relies on, or to check the manifest is compliant to dbt schema in a broader sense? I was particularly curious of in which scenario
nodes
are not a dictionary. -
If the answer for (2) is to validate the dbt schema, would it make sense to use dbt's artifact schemas to validate this? https://schemas.getdbt.com/, and validate the manifest based on the dbt version?
-
Should we allow users to opt-in validating the schema or not, via a configuration in
RenderConfig
? Since this may add additional processing to DAG parsing, it may be worth to have a configuration for enabling / disabling this type of check.
Last but not least, it would be great to have some tests for this feature, once we confirm the original problem / overall approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @AlgirdasDubickas, I wanted to share some thoughts regarding the proposed change. It seems like implementing this might introduce some additional latency during DAG parsing. Additionally, I noticed that there haven't been any updates on the PR for quite some time. I was just wondering if you are still planning to work on this or if there's anything I can assist with.
raise CosmosLoadDbtException("Invalid structure in manifest file") | ||
|
||
# Further validation can be performed here, e.g., checking data types | ||
if not isinstance(manifest["nodes"], dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should always be a dictionary. Have you ever encountered a situation where it wasn't?
if not all(key in manifest for key in ["nodes", "sources", "exposures"]): | ||
logger.error("Manifest file structure is not as expected.") | ||
raise CosmosLoadDbtException("Invalid structure in manifest file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering what happens today if this validation isn't in place when you have an incorrect manifest?
This commit introduces several improvements to the manifest file processing logic to ensure robustness and ease of debugging:
Manifest Structure Validation: Added checks to verify that the manifest file contains the expected top-level keys ("nodes", "sources", "exposures").
Data Type Validation: Implemented additional validation to confirm that the "nodes" key maps to a dictionary. This step helps catch issues where the manifest's format deviates from expected structures, particularly in how nodes are represented.
Key Presence Validation: For each node within the manifest, we now verify the presence of essential keys ("resource_type", "original_file_path", "tags", "config").
Description
Related Issue(s)
Breaking Change?
Checklist