Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][SPARK] Using a temp view in a MERGE INTO spark sql statement leads to MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT error #2940

Open
2 of 8 tasks
scSenicourt opened this issue Apr 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@scSenicourt
Copy link

scSenicourt commented Apr 22, 2024

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

Referring to a temp view in a spark sql MERGE INTO statement leads to a MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT exception. The exception claims that there would a missing attribute, even though the attribute exists.

This used to work with delta 2.2.0, spark 3.3.1 and scala 2.12.15.

Steps to reproduce

import io.delta.tables._

val location = "<your-prefered-location>"

val dfA = Seq(
    ("1", "10")
)
    .toDF("a", "b")
    .select($"a".cast("int"), $"b".cast("int"))
    
dfA
    .write
    .format("delta")
    .mode("overwrite")
    .save(s"${location}")

sql(s"select * from delta.`${location}`").createOrReplaceTempView("a")
    
val dfB = Seq(
    ("1", "20")
)
    .toDF("a", "b")
    .select($"a".cast("int"), $"b".cast("int"))
    
dfB.createOrReplaceTempView("b")

sql("""
	MERGE INTO a
	USING b
	ON a.a = b.a
	WHEN MATCHED THEN UPDATE SET *
""")

Observed results

org.apache.spark.sql.catalyst.ExtendedAnalysisException: [MISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT] Resolved attribute(s) "b" missing from "a" in operator !Project [a#51742, b#51743]. ; line 1 pos 0;
Project [a#51742, b#51743]
+- SubqueryAlias a
   +- !Project [a#51742, b#51743]
      +- SubqueryAlias spark_catalog.delta.`<your-preferred-location>`
         +- Relation [a#51742] parquet

Expected results

Table a is updated.

Further details

For some reason, referring to the supposedly missing column seems to fix the issue:

sql("""
	MERGE INTO a
	USING b
	ON a.a = b.a
	AND b.b = b.b -- this fixes the issue
	WHEN MATCHED THEN UPDATE SET *
""") // this works

Referring to the delta table directly from the sql also fixes the issue:

sql("""
	MERGE INTO delta.`${location}` as a
	USING b
	ON a.a = b.a
	WHEN MATCHED THEN UPDATE SET *
""") // this works

Similarly using the data frame api works without any issue.

Environment information

  • Delta Lake version: 3.1.0
  • Spark version: 3.1.0
  • Scala version: 2.12.17

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@scSenicourt scSenicourt added the bug Something isn't working label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant