-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7762] Optimizing Hudi Table Check with Delta Lake by Refining Class Name Checks In Spark3.5 #11224
Conversation
…lass Name Checks In Spark3.5
…lass Name Checks In Spark3.5
…lass Name Checks In Spark3.5
What action are we executing here? |
like we internally use hoodiecatalog to handle delta table and other types of table. but hoodie(hudi-spark-datasource/hudi-spark3.5.x/src/main/scala/org/apache/spark/sql/adapter/Spark3_5Adapter.scala) will call v1Table when the table is delta and delta will throw exception, which should not be called when it is not a hudi table. |
@@ -54,7 +54,7 @@ class Spark3_5Adapter extends BaseSpark3Adapter { | |||
case plan if !plan.resolved => None | |||
// NOTE: When resolving Hudi table we allow [[Filter]]s and [[Project]]s be applied | |||
// on top of it | |||
case PhysicalOperation(_, _, DataSourceV2Relation(v2: V2TableWithV1Fallback, _, _, _, _)) if isHoodieTable(v2.v1Table) => | |||
case PhysicalOperation(_, _, DataSourceV2Relation(v2: V2TableWithV1Fallback, _, _, _, _)) if isHoodieTable(v2) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonvex can you help for the review?
…lass Name Checks In Spark3.5
…lass Name Checks In Spark3.5
In Hudi, the Spark3_5Adapter calls v2.v1Table which in turn invokes the logic within Delta. When executed on a Delta table, this may result in an error. Therefore, the logic to determine whether it is a Hudi operation has been altered to class name checks to prevent errors during Delta Lake executions.
When executing the delta test of spark3.5, the error is reported as follows:
[DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 org.apache.spark.sql.delta.DeltaIllegalStateException: [DELTA_INVALID_V1_TABLE_CALL] v1Table call is not expected with path based DeltaTableV2 at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall(DeltaErrors.scala:1801) at org.apache.spark.sql.delta.DeltaErrorsBase.invalidV1TableCall$(DeltaErrors.scala:1800) at org.apache.spark.sql.delta.DeltaErrors$.invalidV1TableCall(DeltaErrors.scala:3203) at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$v1Table$1(DeltaTableV2.scala:320) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.delta.catalog.DeltaTableV2.v1Table(DeltaTableV2.scala:320) at org.apache.spark.sql.adapter.Spark3_5Adapter.$anonfun$resolveHoodieTable$1(Spark3_5Adapter.scala:57) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.adapter.Spark3_5Adapter.resolveHoodieTable(Spark3_5Adapter.scala:52) at org.apache.spark.sql.hudi.analysis.HoodieAnalysis$ResolvesToHudiTable$.unapply(HoodieAnalysis.scala:362)
Change Logs
none
Impact
none
Risk level (write none, low medium or high below)
none
Documentation Update
None
Contributor's checklist