[Feature] Added sharded reading #3288

scottxing · 2024-04-30T01:40:07Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

When a large amount of data passes through paimon cdc, about 100 million records are dropped to the paimon ods table. The table attribute sets changelog as input. Then, at this time, I write a flink sql job (using the consumer-id setting), and read This table is inserted into another paimon dwd table (the changelog attribute is lookup). After starting this job, the checkpoint has been stuck at 0% and cannot be completed, so the snapshot cannot be committed. As a result, my other flink sql job cannot check the paimon dwd table. to data. This leads to the phenomenon that a large amount of data from one paimon table must be completely written to another paimon table before it can then be transferred from this paimon table to the next. Data cannot flow smoothly from job to job like a stream.

Solution

Added sharded reading. For large-volume paimon tables, when the job reads, sharding is set up, similar to Flink CDC. After one shard is completed, the next shard is moved on to ensure smooth checkpointing. Let data flow between various paimon tables.

No response

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

scottxing added the enhancement New feature or request label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Added sharded reading #3288

[Feature] Added sharded reading #3288

scottxing commented Apr 30, 2024 •

edited

[Feature] Added sharded reading #3288

[Feature] Added sharded reading #3288

Comments

scottxing commented Apr 30, 2024 • edited

Search before asking

Motivation

Solution

Anything else?

Are you willing to submit a PR?

scottxing commented Apr 30, 2024 •

edited