[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122

TarunMootala · 2024-04-29T19:50:32Z

Describe the problem you faced
We have spark streaming job that reads data from an input stream and appends the data to a COW table partitioned on subject area. This streaming job has a batch internal of 120 seconds.

Intermittently the job is failing with error

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

To Reproduce

No specific steps.

Expected behavior

The job should commit the data successfully and continue with next micro batch.

Environment Description

Hudi version : 0.12.1 (Glue 4.0)
Spark version : Spark 3.3.0
Hive version : N/A
Hadoop version : N/A
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

Additional context

We are not sure on the exact fix and root cause. However, the workaround (not ideal) is to manually delete (archive) few of the oldest Hudi metadata from Active timeline (.hoodie folder) and reduce hoodie.keep.max.commits. This is only working when we reduce max commits, and whenever the max commits are reduced it run perfectly for few months before failing again.

Our requirement is to store 1500 commits to enable incremental query capability on last 2 days of changes. Initially we started with max commits of 1500 and gradually came down to 400.

Hudi Config

            "hoodie.table.name": "table_name",
            "hoodie.datasource.write.keygenerator.type": "COMPLEX",
            "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
            "hoodie.datasource.write.partitionpath.field": "entity_name",
            "hoodie.datasource.write.recordkey.field": "partition_key,sequence_number",
            "hoodie.datasource.write.precombine.field": "approximate_arrival_timestamp",
            "hoodie.datasource.write.operation": "insert",
            "hoodie.insert.shuffle.parallelism": 10,
            "hoodie.bulkinsert.shuffle.parallelism": 10,
            "hoodie.upsert.shuffle.parallelism": 10,
            "hoodie.delete.shuffle.parallelism": 10,
            "hoodie.metadata.enable": "false",
            "hoodie.datasource.hive_sync.use_jdbc": "false",
            "hoodie.datasource.hive_sync.enable": "false",
            "hoodie.datasource.hive_sync.database": "database_name",
            "hoodie.datasource.hive_sync.table": "table_name",
            "hoodie.datasource.hive_sync.partition_fields": "entity_name",
            "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
            "hoodie.datasource.hive_sync.support_timestamp": "true",
            "hoodie.keep.min.commits": 450,  # to preserve commits for at least 2 days with processingTime="120 seconds"
            "hoodie.keep.max.commits": 480,  # to preserve commits for at least 2 days with processingTime="120 seconds"
            "hoodie.cleaner.commits.retained": 449,

Stacktrace

Add the stacktrace of the error.

java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Debugged multiple failure logs, always failing at the stage collect at HoodieSparkEngineContext.java:118 (CleanPlanActionExecutor)

The text was updated successfully, but these errors were encountered:

phani482 · 2024-04-29T20:48:45Z

Same issue reported here in the past, which is still open for RCA #7800

ad1happy2go · 2024-04-30T11:19:15Z

@TarunMootala Is it possible for you to upgrade Hudi version to 0.14.1 and check if you still see this issue. The other issue was related to loading of archival timeline in the sync which was fixed in later releases. #7561

TarunMootala · 2024-04-30T14:48:33Z

@ad1happy2go
Thanks for your inputs.
I don't think it was related to loading of archival timeline. When this error occurred, the first option I've tried is cleaning of archival timeline (.hoodie/archived/) and it didn't help. Only deleting (archive) few of the oldest Hudi metadata from Active timeline (.hoodie folder) and reducing hoodie.keep.max.commits helped to resolve the issue.

ad1happy2go · 2024-04-30T15:55:34Z

@TarunMootala Can you check the size of the timeline files. Can you post the driver logs.

TarunMootala · 2024-04-30T20:16:53Z

@ad1happy2go
.hoodie/ folder is 350 MB and it has 3435 files (this includes active and archival timelines)
.hoodie/archived/ is 327 MB and it has 695 files (only archival timelines)

Attached driver logs
log-events-viewer-result.csv

ad1happy2go · 2024-05-01T12:10:34Z

@TarunMootala The size itself doesn't look so big. In the log I couldn't locate the error. Can you check once.

TarunMootala · 2024-05-01T14:45:08Z

@ad1happy2go,

When AWS Glue encounters OOME it kills the JVM immediately. It could be reason for the error not being available in driver logs. However, the error is present in output logs which is same as given in overview.

ad1happy2go · 2024-05-02T12:24:40Z

@TarunMootala Can you share the timeline? Do you know how many file groups are there in the clean instant?

TarunMootala · 2024-05-02T15:06:27Z

@ad1happy2go

Can you share the timeline?

Can you elaborate on this ?

Do you know how many file groups are there in the clean instant?

Are you referring to number of files in that particular cleaner run ?

codope added spark Issues related to spark spark-streaming spark structured streaming related priority:major degraded perf; unable to move forward; potential bugs labels May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122

[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122

TarunMootala commented Apr 29, 2024 •

edited

phani482 commented Apr 29, 2024

ad1happy2go commented Apr 30, 2024 •

edited

TarunMootala commented Apr 30, 2024

ad1happy2go commented Apr 30, 2024

TarunMootala commented Apr 30, 2024 •

edited

ad1happy2go commented May 1, 2024

TarunMootala commented May 1, 2024

ad1happy2go commented May 2, 2024

TarunMootala commented May 2, 2024 •

edited

[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122

[SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table #11122

Comments

TarunMootala commented Apr 29, 2024 • edited

phani482 commented Apr 29, 2024

ad1happy2go commented Apr 30, 2024 • edited

TarunMootala commented Apr 30, 2024

ad1happy2go commented Apr 30, 2024

TarunMootala commented Apr 30, 2024 • edited

ad1happy2go commented May 1, 2024

TarunMootala commented May 1, 2024

ad1happy2go commented May 2, 2024

TarunMootala commented May 2, 2024 • edited

TarunMootala commented Apr 29, 2024 •

edited

ad1happy2go commented Apr 30, 2024 •

edited

TarunMootala commented Apr 30, 2024 •

edited

TarunMootala commented May 2, 2024 •

edited