Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][Spark] Inconsistent casting behavior in insert #2988

Open
1 of 3 tasks
johanl-db opened this issue Apr 29, 2024 · 0 comments
Open
1 of 3 tasks

[BUG][Spark] Inconsistent casting behavior in insert #2988

johanl-db opened this issue Apr 29, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@johanl-db
Copy link
Collaborator

Bug

Describe the problem

Inserting data into an existing table behaves differently when using SQL vs python/scala and when using saveAsTable vs. insertInto.

If the ingested data uses a type that is different from the type of the column in the table, inserting usingsaveAsTable in python/scala fails where alternative ways to insert the data succeed:

  • SQL INSERT by position: succeeds
  • SQL INSERT by name: succeeds
  • Python/scala insert by position (insertInto): succeeds
  • Python/scala insert by name (saveAsTable): fails.

Steps to reproduce

Using python:

Create table with type int

(
  spark.createDataFrame([[1]], "value: int")
    .write
    .mode("append")
    .format("delta")
    .saveAsTable("append_by_name_cast")
)

Append data as long using saveAsTable - fails:

(
  spark.createDataFrame([[2]], "value: long")
    .write
    .mode("append")
    .format("delta")
    .saveAsTable("johan_lasperas.playground.append_by_name_cast")
)
AnalysisException: Failed to merge fields 'value' and 'value'

Append data as long using insertInto - succeeds:

(
  spark.createDataFrame([[2]], "value: long")
    .write
    .mode("append")
    .format("delta")
    .insertInto("johan_lasperas.playground.append_by_name_cast")
)

Append data as long using SQL INSERT by position - succeeds:

INSERT INTO johan_lasperas.playground.append_by_name_cast VALUES (CAST(3 AS LONG))

Append data as long using SQL INSERT by name - succeeds:

INSERT INTO johan_lasperas.playground.append_by_name_cast (value) VALUES (CAST(4 AS LONG))

Observed results

Appending data with saveAsTable using a different type than the type of the column in a table fails. It should succeed

Further details

Environment information

  • Delta Lake version: 3.1
  • Spark version: 3.4
  • Scala version: 2.13

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@johanl-db johanl-db added the bug Something isn't working label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant