Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] PsSpark: downgrade Delta table protocol by rewriting the table (Delta 2.2.0) #2965

Open
keen85 opened this issue Apr 24, 2024 · 0 comments

Comments

@keen85
Copy link

keen85 commented Apr 24, 2024

Hi, I have a Delta table where column mapping is enabled (minReaderVersion=2, minWriterVersion=5).
There is a reading application that does not understand that feature / protocol yet, so I thought that I could simply downgrade the Delta table to a lower protocol by rewriting the entire table (accepting that RENAME/DROP COLUMN will not work anymore):

(
	spark
	.table("delta_table")
	.write
	.format("delta")
	.mode("overwrite")
	.option("overwriteSchema", "True")
	.saveAsTable("delta_table")
)

However, the resulting table still has minReaderVersion=2, minWriterVersion=5.

Inconsistently, when using a different (new) name in saveAsTable() the new table will have minReaderVersion=1, minWriterVersion=2.

Is this plausible or a bug?

Is there a way to explicitly set the protocol / table properties as part of the df.write operation?

I am using Spark 3.3 + Delta 2.2.0.

full code to reproduce

from delta.tables import DeltaTable

# setup: create table with some rows
dt = (
    DeltaTable.createOrReplace(spark)
        .tableName("test_delta_table_properties")
        .addColumn("id", "BIGINT")
        .addColumn("product_type", "STRING")
        .addColumn("sales", "BIGINT")
    .execute()
)

spark.sql("""
    INSERT INTO test_delta_table_properties (id, product_type, sales) VALUES
        (1, 'a', 1000),
        (2, 'b', 2000),
        (3, 'cc', 30000)
""")
print("initial")
DeltaTable.forName(spark, "test_delta_table_properties").detail().select("minReaderVersion", "minWriterVersion").show(truncate=False, vertical=True)

# setup: upgrade protocol
spark.sql("ALTER TABLE test_delta_table_properties SET TBLPROPERTIES ('delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name')")
print("after protocol upgrade")
DeltaTable.forName(spark, "test_delta_table_properties").detail().select("minReaderVersion", "minWriterVersion").show(truncate=False, vertical=True)

# try overwriting table ==> protocol stays the same
(
    spark
    .table("test_delta_table_properties")
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "True")
    .saveAsTable("test_delta_table_properties")
)
print("after overwriting table")
DeltaTable.forName(spark, "test_delta_table_properties").detail().select("minReaderVersion", "minWriterVersion").show(truncate=False, vertical=True)

# writing table to *new* table ==> protocol is reset to lowest
(
    spark
    .table("test_delta_table_properties")
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "True")
    .saveAsTable("test_delta_table_properties_lower_protocol")
)
print("after write to *new* table")
DeltaTable.forName(spark, "test_delta_table_properties_lower_protocol").detail().select("minReaderVersion", "minWriterVersion").show(truncate=False, vertical=True)

result

initial
-RECORD 0---------------
 minReaderVersion | 1   
 minWriterVersion | 2   

after protocol upgrade
-RECORD 0---------------
 minReaderVersion | 2   
 minWriterVersion | 5   

after overwriting table
-RECORD 0---------------
 minReaderVersion | 2   
 minWriterVersion | 5   

after write to *new* table
-RECORD 0---------------
 minReaderVersion | 1   
 minWriterVersion | 2 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant