You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should consider limiting the intermediary files size by default as a lot of destinations (e.g. BigQuery) have a maximum file size they can handle. Otherwise a pipeline might run for 2h and cause an issue like this:
[ERROR ]|17898|8637379136|dlt|load.py|complete_jobs:311|Job for analytics_events.d35853bf5f.jsonl failed terminally in load 1715003335.526644 with message {"error_result":{"reason":"invalid","message":"Error while reading data, error message: Input JSON files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 5706284890. Max allowed size is: 4294967296."},"errors":[{"reason":"invalid","message":"Error while reading data, error message: Input JSON files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 5706284890. Max allowed size is: 4294967296."}],"job_start":"2024-05-06T15:37:59.326000Z","job_end":"2024-05-06T15:37:59.405000Z","job_id":"analytics_events_d35853bf5f_0_jsonl"}
add new destination capability: recommended file size,
in buffered writer when caps are present and no explicit limit is set - use it
set it for bigquery. 1GB looks like a safe option. try to look for snowflake and databricks, if there are any recommendations then follow them. otherwise leave None
Feature description
We should consider limiting the intermediary files size by default as a lot of destinations (e.g. BigQuery) have a maximum file size they can handle. Otherwise a pipeline might run for 2h and cause an issue like this:
[ERROR ]|17898|8637379136|dlt|load.py|complete_jobs:311|Job for analytics_events.d35853bf5f.jsonl failed terminally in load 1715003335.526644 with message {"error_result":{"reason":"invalid","message":"Error while reading data, error message: Input JSON files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 5706284890. Max allowed size is: 4294967296."},"errors":[{"reason":"invalid","message":"Error while reading data, error message: Input JSON files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 5706284890. Max allowed size is: 4294967296."}],"job_start":"2024-05-06T15:37:59.326000Z","job_end":"2024-05-06T15:37:59.405000Z","job_id":"analytics_events_d35853bf5f_0_jsonl"}
See slack thread for context: https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1715010931231539
Are you a dlt user?
None
Use case
No response
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered: