Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance manifest structure validation and add debug logging #884

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AlgirdasDubickas
Copy link

This commit introduces several improvements to the manifest file processing logic to ensure robustness and ease of debugging:

  1. Manifest Structure Validation: Added checks to verify that the manifest file contains the expected top-level keys ("nodes", "sources", "exposures").

  2. Data Type Validation: Implemented additional validation to confirm that the "nodes" key maps to a dictionary. This step helps catch issues where the manifest's format deviates from expected structures, particularly in how nodes are represented.

  3. Key Presence Validation: For each node within the manifest, we now verify the presence of essential keys ("resource_type", "original_file_path", "tags", "config").

Description

Related Issue(s)

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

This commit introduces several improvements to the manifest file processing logic to ensure robustness and ease of debugging:

1. **Manifest Structure Validation**: Added checks to verify that the manifest file contains the expected top-level keys ("nodes", "sources", "exposures"). 

2. **Data Type Validation**: Implemented additional validation to confirm that the "nodes" key maps to a dictionary. This step helps catch issues where the manifest's format deviates from expected structures, particularly in how nodes are represented.

3. **Key Presence Validation**: For each node within the manifest, we now verify the presence of essential keys ("resource_type", "original_file_path", "tags", "config").
@AlgirdasDubickas AlgirdasDubickas requested a review from a team as a code owner March 9, 2024 07:17
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Mar 9, 2024
@dosubot dosubot bot added area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:parse Primarily related to dbt parse command or functionality parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing labels Mar 9, 2024
Copy link

netlify bot commented Mar 9, 2024

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit a57c76f
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/65ec0d0ea2ae8b0008e7e025
😎 Deploy Preview https://deploy-preview-884--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Mar 9, 2024

Deploy Preview for sunny-pastelito-5ecb04 ready!

Name Link
🔨 Latest commit 1511f82
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/65f053f41975e30008de6486
😎 Deploy Preview https://deploy-preview-884--sunny-pastelito-5ecb04.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlgirdasDubickas thanks for the contribution!

Some comments:

  1. Did you run into some issue that motivated this PR..?

  2. Is the overall goal to check if the manifest contains fields that Cosmos relies on, or to check the manifest is compliant to dbt schema in a broader sense? I was particularly curious of in which scenario nodes are not a dictionary.

  3. If the answer for (2) is to validate the dbt schema, would it make sense to use dbt's artifact schemas to validate this? https://schemas.getdbt.com/, and validate the manifest based on the dbt version?

  4. Should we allow users to opt-in validating the schema or not, via a configuration in RenderConfig? Since this may add additional processing to DAG parsing, it may be worth to have a configuration for enabling / disabling this type of check.

Last but not least, it would be great to have some tests for this feature, once we confirm the original problem / overall approach

Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @AlgirdasDubickas, I wanted to share some thoughts regarding the proposed change. It seems like implementing this might introduce some additional latency during DAG parsing. Additionally, I noticed that there haven't been any updates on the PR for quite some time. I was just wondering if you are still planning to work on this or if there's anything I can assist with.

raise CosmosLoadDbtException("Invalid structure in manifest file")

# Further validation can be performed here, e.g., checking data types
if not isinstance(manifest["nodes"], dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be a dictionary. Have you ever encountered a situation where it wasn't?

Comment on lines +403 to +405
if not all(key in manifest for key in ["nodes", "sources", "exposures"]):
logger.error("Manifest file structure is not as expected.")
raise CosmosLoadDbtException("Invalid structure in manifest file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering what happens today if this validation isn't in place when you have an incorrect manifest?

@tatiana tatiana added this to the 1.6.0 milestone May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:parsing Related to parsing DAG/DBT improvement, issues, or fixes cosmos dbt:parse Primarily related to dbt parse command or functionality parsing:dbt_manifest Issues, questions, or features related to dbt_manifest parsing size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants