You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched in the issues and found nothing similar.
Motivation
When a large amount of data passes through paimon cdc, about 100 million records are dropped to the paimon ods table. The table attribute sets changelog as input. Then, at this time, I write a flink sql job (using the consumer-id setting), and read This table is inserted into another paimon dwd table (the changelog attribute is lookup). After starting this job, the checkpoint has been stuck at 0% and cannot be completed, so the snapshot cannot be committed. As a result, my other flink sql job cannot check the paimon dwd table. to data. This leads to the phenomenon that a large amount of data from one paimon table must be completely written to another paimon table before it can then be transferred from this paimon table to the next. Data cannot flow smoothly from job to job like a stream.
Solution
Added sharded reading. For large-volume paimon tables, when the job reads, sharding is set up, similar to Flink CDC. After one shard is completed, the next shard is moved on to ensure smooth checkpointing. Let data flow between various paimon tables.
No response
Anything else?
No response
Are you willing to submit a PR?
I'm willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Motivation
When a large amount of data passes through paimon cdc, about 100 million records are dropped to the paimon ods table. The table attribute sets changelog as input. Then, at this time, I write a flink sql job (using the consumer-id setting), and read This table is inserted into another paimon dwd table (the changelog attribute is lookup). After starting this job, the checkpoint has been stuck at 0% and cannot be completed, so the snapshot cannot be committed. As a result, my other flink sql job cannot check the paimon dwd table. to data. This leads to the phenomenon that a large amount of data from one paimon table must be completely written to another paimon table before it can then be transferred from this paimon table to the next. Data cannot flow smoothly from job to job like a stream.
Solution
Added sharded reading. For large-volume paimon tables, when the job reads, sharding is set up, similar to Flink CDC. After one shard is completed, the next shard is moved on to ensure smooth checkpointing. Let data flow between various paimon tables.
No response
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: