-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Retain Glue Catalog column comment after updating Iceberg table #10276
base: main
Are you sure you want to change the base?
Conversation
I built the jars and manually tested on Amazon EMR as a Spark runtime. |
List<Column> columns = toColumns(metadata); | ||
if (existingTable != null) { | ||
List<Column> existingColumns = existingTable.storageDescriptor().columns(); | ||
Map<String, Column> existingColumnMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this a map of column name to comment here. Since we aren't using any additional information from the existing comments
@@ -501,6 +501,97 @@ public void testColumnCommentsAndParameters() { | |||
assertThat(actualColumns).isEqualTo(expectedColumns); | |||
} | |||
|
|||
@Test | |||
public void testGlueTableColumnCommentsPreserved() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the case where we drop a column from a table. I believe today we keep those columns and descriptions. Should we add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @lawofcycles! I've taken an initial pass. Thanks for the solid work!
@geruh Thank you for your review! |
Optional.ofNullable(properties.get(GLUE_DESCRIPTION_KEY)) | ||
.ifPresent(tableInputBuilder::description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lawofcycles thank you for your patch!
Now this method has the existing table information, so I think it would be better to move the function to keep the table description https://github.com/aajisaka/iceberg/blob/58c61241bd61de7a4062dbf664691f05eeb2ea53/aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java#L319-L321 into this method.
What this PR does
This PR addresses the issue where column comments in the Glue catalog are not retained when updating an existing Iceberg table schema. It fixes the problem by modifying the IcebergToGlueConverter.setTableInputInformation method to preserve existing column comments from the Glue table when creating a new TableInput for an update operation.
When an existing Glue table is provided, the modified method retrieves the comments from its columns and applies them to the corresponding columns in the new TableInput, but only if the Iceberg table's column does not already have a comment.
This change ensures that any user-defined column comments in the Glue catalog are carried over during table updates, preventing unintentional loss of documentation.
Testing
Added a new unit test method testSetTableInputInformationWithExistingTable to verify that column comments from an existing Glue table are correctly preserved when creating a new TableInput.
I'll also add Integration Tests same as #10199.
Fixes #10220