-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Run Merge On Read Compactions #11249
Comments
@jai20242 try compaction schedule first. |
I tried it but it didn't work: 1) Connect 2) Show commits 3) Run compaction 4) Schedule compaction 5) Run compaction again And the hudi path is: 2) The files in a partition |
@jai20242 If you have only 2 delta commits then there will be nothing to compact as default |
I put the param hoodie.compact.inline.max.delta.commits to 1 (you can see it in the first comment) |
@jai20242 That is writer configuration. Hoodie don't save them. When you do compaction from cli. you need to pass there too |
I tried it adding the configuration using compaction schedule and compaction run but it didn't work. hudi->connect --path /tmp/dep_hudi2 hudi:prueba->compaction run --tableName prueba —hoodieConfigs "hoodie.compact.inline.max.delta.commits=1" |
Hi.
I have ingested data using the following configuration:
option(OPERATION_OPT_KEY, "upsert").
option(CDC_ENABLED.key(), "true").
option(TABLE_NAME, tableName).
option("hoodie.datasource.write.payload.class","CustomOverwriteWithLatestAvroPayload").
option("hoodie.avro.schema.validate","false").
option("hoodie.datasource.write.recordkey.field",keysTable.mkString(",")).
option("hoodie.datasource.write.precombine.field",COLUMN_TO_SORT).
option("hoodie.datasource.write.new.columns.nullable", "true").
option("hoodie.datasource.write.reconcile.schema","true").
option("hoodie.metadata.enable","false").
option("hoodie.index.type","SIMPLE").
option("hoodie.datasource.write.table.type","MERGE_ON_READ").
option("hoodie.compact.inline","false").
option("hoodie.datasource.write.partitionpath.field","bdp_partition").
option("hoodie.compact.inline.max.delta.commits","1").
mode(Append).
save(dataPath)
But I can't run the compaction. It doesn't work (in the configuration you can see the option hoodie.compact.inline.max.delta.commits = 1 but if I delete it and execute 2 commits happens the same)
The .hoodie folder has the following files
. .schema
.. .temp
.20240516143453846.deltacommit.crc 20240516143453846.deltacommit
.20240516143453846.deltacommit.inflight.crc 20240516143453846.deltacommit.inflight
.20240516143453846.deltacommit.requested.crc 20240516143453846.deltacommit.requested
.20240516144403250.deltacommit.crc 20240516144403250.deltacommit
.20240516144403250.deltacommit.inflight.crc 20240516144403250.deltacommit.inflight
.20240516144403250.deltacommit.requested.crc 20240516144403250.deltacommit.requested
.20240516154539132.deltacommit.crc 20240516154539132.deltacommit
.20240516154539132.deltacommit.inflight.crc 20240516154539132.deltacommit.inflight
.20240516154539132.deltacommit.requested.crc 20240516154539132.deltacommit.requested
.aux archived
.hoodie.properties.crc hoodie.properties
And a partition:
.
..
..47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_20240516143453846.log.1_1-26-114.crc
..47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_20240516143453846.log.2_1-60-262.crc
..7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_20240516143453846.log.1_0-26-113.crc
..7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_20240516143453846.log.2_0-60-261.crc
..hoodie_partition_metadata.crc
.47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_0-26-105_20240516143453846.parquet.crc
.47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_20240516143453846.log.1_1-26-114
.47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_20240516143453846.log.2_1-60-262
.7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_1-26-106_20240516143453846.parquet.crc
.7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_20240516143453846.log.1_0-26-113
.7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_20240516143453846.log.2_0-60-261
.hoodie_partition_metadata
47546248-a9d6-4a99-9c56-6dc1b0c9ad82-0_0-26-105_20240516143453846.parquet
7504f0fe-c40f-4bfa-88c0-bf905840f04b-0_1-26-106_20240516143453846.parquet
Finally. I am trying to compact using command cli.
I can see two commits:
hudi:prueba->commits show --sortBy "Total Bytes Written" --desc true --limit 10
╔═══════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
║ CommitTime │ Total Bytes Written │ Total Files Added │ Total Files Updated │ Total Partitions Written │ Total Records Written │ Total Update Records Written │ Total Errors ║
╠═══════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
║ 20240516144403250 │ 752,5 MB │ 0 │ 14 │ 7 │ 1435323 │ 1435323 │ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20240516143453846 │ 41,7 MB │ 14 │ 0 │ 7 │ 1435323 │ 0 │ 0 ║
╚═══════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
But there is no compactions:
compactions show all
╔═════════════════════════╤═══════╤═══════════════════════════════╗
║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
╠═════════════════════════╧═══════╧═══════════════════════════════╣
║ (empty) ║
╚═════════════════════════════════════════════════════════════════╝
But the command compaction run returns the following message (after executing the command compaction schedule)
prueba->compaction run --tableName prueba
2024-05-16 14:17:08.633 INFO 58141 --- [ main] o.a.h.c.t.t.HoodieActiveTimeline : Loaded instants upto : Option{val=[20240516134708181__deltacommit__COMPLETED__20240516135028000]}
NO PENDING COMPACTION TO RUN
The text was updated successfully, but these errors were encountered: