Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error The query is too large. The maximum standard SQL query length is 1024.00K characters when using insert_rows_method chunk #675

Open
aibazhang opened this issue Mar 6, 2024 · 4 comments

Comments

@aibazhang
Copy link
Contributor

aibazhang commented Mar 6, 2024

Hi guys!

I've added arg chunk_size when calling insert_rows to fix the error Maximum number of resources referenced per query are 1,000 resources serval days ago. #669

However, a new error has occurred The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.

It appears that the compiled_code was too long when running the following query, even though we only have 400 records to insert.

insert into `my-project`.`elementary`.`dbt_run_results`
         (model_execution_id,unique_id,invocation_id,generated_at,created_at,name,message,status,resource_type,execution_time,execute_started_at,execute_completed_at,compile_started_at,compile_completed_at,rows_affected,full_refresh,compiled_code,failures,query_id,thread_id,materialization,adapter_response) values ...

Is there a way to make max_query_size work even when using insert_rows_method chunk?
If you have any ideas, please let me know, I can submit a PR.

Thank you

Environment

- dbt=1.7.8
- adapter: bigquery=1.7.4
- elementary-data/elementary=0.14.1

Variables

elementary:
    "insert_rows_method": "chunk"
    "dbt_artifacts_chunk_size": 500
@elongl
Copy link
Member

elongl commented May 12, 2024

Hi @aibazhang, thanks for opening this issue.
Would you please care to elaborate on why you're using the chunk method instead of max_query_size?
Asking because I think it'd make more sense to solve a query size issue that's imposed by BigQuery.

@tsmo4
Copy link

tsmo4 commented May 12, 2024

Hi @aibazhang, thanks for opening this issue. Would you please care to elaborate on why you're using the chunk method instead of max_query_size? Asking because I think it'd make more sense to solve a query size issue that's imposed by BigQuery.

Speaking to our use case, we have a sizable DBT project, and it appears that we need both options in place on a run of the post hook in order for it to execute. When executing the upload of the dbt test meta data we hit errors:

  • without the chunk method in place Big Query throws a resource error, due to the size and complexity of the generated query (not it's code length) Maximum number of resources
  • with the chunk method in place, which invalidates the max_query_size restriction, the query becomes too long and BQ throws an error. The query is too large

@elongl
Copy link
Member

elongl commented May 13, 2024

@tsmo4 Have you tried lowering the default query_max_size?

@aibazhang
Copy link
Contributor Author

We have a similar use case to @tsmo4 's

Have you tried lowering the default query_max_size?

We've tried to lower the default query_max_size, but the number of subqueries will exceed the BigQuery limit because the size of each record is very small.

Error: too many subqueries or query is too complex

Maximum number of resources referenced per query are 1,000 resources

https://cloud.google.com/bigquery/quotas

As a result, we considered to use the chunk mode (because the number of subqueries could be restricted to a constant), but the error described in the description occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants