Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5 #11224

Merged
merged 5 commits into from
May 31, 2024

Conversation

majian1998
Copy link
Contributor

In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic within Delta. When executed on a Delta table, this may result in an error. Therefore, the logic to determine whether it is a Hudi operation has been altered to class name checks to prevent errors during Delta Lake executions.
When executing the delta test of spark3.5, the error is reported as follows:
[DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 org.apache.spark.sql.delta.DeltaIllegalStateException: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall(DeltaErrors.scala:1801) at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall$(DeltaErrors.scala:1800) at org.apache.spark.sql.delta.DeltaErrors$.invalidV1TableCall(DeltaErrors.scala:3203) at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$v1Table$1(DeltaTableV2.scala:320) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.catalog.DeltaTableV2.v1Table(DeltaTableV2.scala:320) at org.apache.spark.sql.adapter.Spark3_5Adapter.$anonfun$resolveHoodieTable$1(Spark3_5Adapter.scala:57) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.adapter.Spark3_5Adapter.resolveHoodieTable(Spark3_5Adapter.scala:52) at org.apache.spark.sql.hudi.analysis.HoodieAnalysis$ResolvesToHudiTable$.unapply(HoodieAnalysis.scala:362)

Change Logs

none

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label May 15, 2024
@danny0405
Copy link
Contributor

When executed on a Delta table, this may result in an error.

What action are we executing here?

@leesf
Copy link
Contributor

leesf commented May 22, 2024

When executed on a Delta table, this may result in an error.

What action are we executing here?

like INSERT OVERWRITE delta./tmp/delta-table SELECT col1 as id FROM VALUES 5,6,7,8,9; in https://docs.delta.io/latest/quick-start.html

we internally use hoodiecatalog to handle delta table and other types of table. but hoodie(hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala) will call v1Table when the table is delta and delta will throw exception, which should not be called when it is not a hudi table.

@@ -54,7 +54,7 @@ class Spark3_5Adapter extends BaseSpark3Adapter {
case plan if !plan.resolved => None
// NOTE: When resolving Hudi table we allow [[Filter]]s and [[Project]]s be applied
// on top of it
case PhysicalOperation(_, _, DataSourceV2Relation(v2: V2TableWithV1Fallback, _, _, _, _)) if isHoodieTable(v2.v1Table) =>
case PhysicalOperation(_, _, DataSourceV2Relation(v2: V2TableWithV1Fallback, _, _, _, _)) if isHoodieTable(v2) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonvex can you help for the review?

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@leesf leesf merged commit 130ea1a into apache:master May 31, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS PR with lines of changes in <= 10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants