[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec #46523

AngersZhuuuu · 2024-05-10T08:50:29Z

What changes were proposed in this pull request?

It's a new approach to fix SPARK-39551
This situation happened for AQEPropagateEmptyRelation when one side is empty and one side is BroadcastQueryStateExec
This pr avoid do propagate, not to revert all queryStagePreparationRules's result.

Why are the changes needed?

Fix bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manuel tested SPARK-39551: Invalid plan check - invalid broadcast query stage, it can work well without origin fix and current pr

For added UT,

  test("SPARK-48155: AQEPropagateEmptyRelation check remained child for join") {
    withSQLConf(
      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
      val (_, adaptivePlan) = runAdaptiveAndVerifyResult(
        """
          |SELECT /*+ BROADCAST(t3) */ t3.b, count(t3.a) FROM testData2 t1
          |INNER JOIN (
          |  SELECT * FROM testData2
          |  WHERE b = 0
          |  UNION ALL
          |  SELECT * FROM testData2
          |  WHErE b != 0
          |) t2
          |ON t1.b = t2.b AND t1.a = 0
          |RIGHT OUTER JOIN testData2 t3
          |ON t1.a > t3.a
          |GROUP BY t3.b
        """.stripMargin
      )
      assert(findTopLevelBroadcastNestedLoopJoin(adaptivePlan).size == 1)
      assert(findTopLevelUnion(adaptivePlan).size == 0)
    }
  }

before this pr the adaptive plan is

*(9) HashAggregate(keys=[b#226], functions=[count(1)], output=[b#226, count(a)#228L])
+- AQEShuffleRead coalesced
   +- ShuffleQueryStage 3
      +- Exchange hashpartitioning(b#226, 5), ENSURE_REQUIREMENTS, [plan_id=356]
         +- *(8) HashAggregate(keys=[b#226], functions=[partial_count(1)], output=[b#226, count#232L])
            +- *(8) Project [b#226]
               +- BroadcastNestedLoopJoin BuildRight, RightOuter, (a#23 > a#225)
                  :- *(7) Project [a#23]
                  :  +- *(7) SortMergeJoin [b#24], [b#220], Inner
                  :     :- *(5) Sort [b#24 ASC NULLS FIRST], false, 0
                  :     :  +- AQEShuffleRead coalesced
                  :     :     +- ShuffleQueryStage 0
                  :     :        +- Exchange hashpartitioning(b#24, 5), ENSURE_REQUIREMENTS, [plan_id=211]
                  :     :           +- *(1) Filter (a#23 = 0)
                  :     :              +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
                  :     :                 +- Scan[obj#22]
                  :     +- *(6) Sort [b#220 ASC NULLS FIRST], false, 0
                  :        +- AQEShuffleRead coalesced
                  :           +- ShuffleQueryStage 1
                  :              +- Exchange hashpartitioning(b#220, 5), ENSURE_REQUIREMENTS, [plan_id=233]
                  :                 +- Union
                  :                    :- *(2) Project [b#220]
                  :                    :  +- *(2) Filter (b#220 = 0)
                  :                    :     +- *(2) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#219, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#220]
                  :                    :        +- Scan[obj#218]
                  :                    +- *(3) Project [b#223]
                  :                       +- *(3) Filter NOT (b#223 = 0)
                  :                          +- *(3) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#222, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#223]
                  :                             +- Scan[obj#221]
                  +- BroadcastQueryStage 2
                     +- BroadcastExchange IdentityBroadcastMode, [plan_id=260]
                        +- *(4) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#225, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#226]
                           +- Scan[obj#224]

After this patch

*(6) HashAggregate(keys=[b#226], functions=[count(1)], output=[b#226, count(a)#228L])
+- AQEShuffleRead coalesced
   +- ShuffleQueryStage 3
      +- Exchange hashpartitioning(b#226, 5), ENSURE_REQUIREMENTS, [plan_id=319]
         +- *(5) HashAggregate(keys=[b#226], functions=[partial_count(1)], output=[b#226, count#232L])
            +- *(5) Project [b#226]
               +- BroadcastNestedLoopJoin BuildRight, RightOuter, (a#23 > a#225)
                  :- LocalTableScan <empty>, [a#23]
                  +- BroadcastQueryStage 2
                     +- BroadcastExchange IdentityBroadcastMode, [plan_id=260]
                        +- *(4) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#225, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#226]
                           +- Scan[obj#224]
[info] - xxxx (3 seconds, 136 milliseconds)

Was this patch authored or co-authored using generative AI tooling?

No

… remain child is just BroadcastQueryStageExec

AngersZhuuuu · 2024-05-10T08:51:14Z

ping @cloud-fan @maryannxue Pls take a look

dongjoon-hyun

Do you think you can provide test cases for this, @AngersZhuuuu ?

AngersZhuuuu · 2024-05-11T01:49:04Z

Do you think you can provide test cases for this, @AngersZhuuuu ?

SPARK-39551: Invalid plan check - invalid broadcast query stage Can cover this, I don't know if we need to remove ValidateSparkPlan rule, it's too weird and rough.

AngersZhuuuu · 2024-05-11T02:39:37Z

Do you think you can provide test cases for this, @AngersZhuuuu ?

Added a new UT to show the difference, pls take a look again @dongjoon-hyun

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

cloud-fan · 2024-05-14T05:22:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala

+  // Project
+  //   +- LogicalQueryStage(_, BroadcastQueryStage)
+  // Then after LogicalQueryStageStrategy, will only remain BroadcastQueryStage after project,
+  // the plan can't execute.


We can simply say

// A broadcast query stage can't be executed without the join operator. // TODO: we can return the original query plan before broadcast.

how hard is it to return the original query plan? Seems not hard as we just need to add a new def returnSingleJoinSide function in the base class, and unwrap broadcast stage in the AQE rule.

cloud-fan · 2024-05-14T09:32:22Z

thanks, merging to master!

[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if…

7165043

… remain child is just BroadcastQueryStageExec

github-actions bot added the SQL label May 10, 2024

dongjoon-hyun reviewed May 10, 2024

View reviewed changes

Update AdaptiveQueryExecSuite.scala

c391832

cloud-fan reviewed May 13, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala Show resolved Hide resolved

cloud-fan reviewed May 13, 2024

View reviewed changes

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala Outdated Show resolved Hide resolved

cloud-fan reviewed May 13, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Show resolved Hide resolved

AngersZhuuuu added 3 commits May 14, 2024 09:32

follow comment

a483b72

follow comment

353778b

.

0c7d86a

cloud-fan reviewed May 14, 2024

View reviewed changes

cloud-fan approved these changes May 14, 2024

View reviewed changes

AngersZhuuuu added 2 commits May 14, 2024 14:15

Update AQEPropagateEmptyRelation.scala

311ea00

Update AQEPropagateEmptyRelation.scala

269b175

cloud-fan approved these changes May 14, 2024

View reviewed changes

cloud-fan closed this in e5ad5e9 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec #46523

[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec #46523

AngersZhuuuu commented May 10, 2024 •

edited

AngersZhuuuu commented May 10, 2024

dongjoon-hyun left a comment

AngersZhuuuu commented May 11, 2024

AngersZhuuuu commented May 11, 2024

cloud-fan May 14, 2024

cloud-fan May 14, 2024

cloud-fan commented May 14, 2024

[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec #46523

[SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec #46523

Conversation

AngersZhuuuu commented May 10, 2024 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

AngersZhuuuu commented May 10, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

AngersZhuuuu commented May 11, 2024

AngersZhuuuu commented May 11, 2024

cloud-fan May 14, 2024

Choose a reason for hiding this comment

cloud-fan May 14, 2024

Choose a reason for hiding this comment

cloud-fan commented May 14, 2024

AngersZhuuuu commented May 10, 2024 •

edited