Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertions fail on partitioned BigQuery tables with requirePartitionFilter enabled #1622

Open
RyuSA opened this issue Dec 21, 2023 · 0 comments

Comments

@RyuSA
Copy link

RyuSA commented Dec 21, 2023

Problem Summary:

I found that assertions for SQLX scripts fail with the following error when config.bigquery.requirePartitionFilter: true:

Cannot query over table 'dataform.repro' without a filter over column(s) 'date' that can be used for partition elimination

Minimal Reproduction:

config {
  type: "incremental", // or "table"
  bigquery: {
    partitionBy: "date",
    requirePartitionFilter: true,
  },
  assertions: {
    uniqueKey: "id"
  }
}

SELECT
  CURRENT_DATE('Japan') AS date,
  1 AS id,

The error message originates from BigQuery. I found BigQueryAdaptor uses Adaptor.indexAssertion for the assertion, ignoring BigQueryOptions.

This means the generated assertion SQL does not contain filters for partition columns.

Problem Details:

This is the assertion SQL for the minimal reproduction.

SELECT
  *
FROM (
  SELECT
    id,
    COUNT(1) AS index_row_count
  FROM `project.dataform.repro`
  GROUP BY id
  // missing a filter for column "id"
  ) AS data
WHERE index_row_count > 1

The SQL is generated by Adaptor.indexAssertion and it is missing the filter. This is because that BigQueryAdaptor{.indexAssertion || .rowConditionsAssertion} disregards BigQueryOptions during assertions, causing partition settings to be neglected(=assertion query does not contain partition clause.).

BigQueryAdaptor should have implementations for the methods, like below.

export class BigQueryAdapter extends Adapter implements IAdapter {

    // for uniqueKey
    public indexAssertion(dataset: string, indexCols: string[]) {
        // do something to filter partition columns
        const partitionColumnStatement = ...

        return `
SELECT
  *
FROM (
  SELECT
    ${commaSeparatedColumns},
    COUNT(1) AS index_row_count
  FROM ${dataset}
  GROUP BY ${commaSeparatedColumns}
  AND ${partitionColumnStatement}
  ) AS data
WHERE index_row_count > 1 
`
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant