Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade AWS SDK to V2 #2972

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lliangyu-lin
Copy link

@lliangyu-lin lliangyu-lin commented Apr 25, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Storage
  • storageS3DynamoDB

Description

The AWS SDK for Java 1.x is being deprecated will enter maintenance mode on July 31, 2024. The end-of-support is effective December 31, 2025.
To address the package deprecation, we’ll need to upgrade AWS SDK Java 1.x in delta to AWS SDK Java 2.x.
SDK v2 is a major rewrite of the version 1.x code base. For detailed differences, please refer to What's different between the AWS SDK for Java 1.x and 2.x.

List of files in delta main branch that are currently leveraging AWS SDK v1 APIs: https://github.com/search?q=repo%3Adelta-io%2Fdelta%20com.amazonaws&type=code. These are files that we need to update for this upgrade.

Note: Part of this patch is based upon another open PR: https://github.com/delta-io/delta/pull/2408/files.

How was this patch tested?

Unit Test

  • build/sbt storageS3DynamoDB/test: passing
  • build/sbt storage/test: passing

Integration Test

run-integration-tests.py --s3-log-store-util-only
[info] - setup empty delta log
[info] - empty
[info] - small
[info] - medium
[info] - large
[info] S3LogStoreUtilTest:
[info] Run completed in 22 seconds, 503 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 24 s, completed Apr 23, 2024, 9:36:04 AM

Manual Testing

spark-sql \
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName=delta_log1 \
--conf spark.io.delta.storage.S3DynamoDBLogStore.ddb.region=us-east-1 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--jars /usr/share/aws/delta/lib/delta-storage-s3-dynamodb.jar
CREATE TABLE my_delta_table_1 (
id INT,
value INT
) USING delta;

INSERT INTO my_delta_table_1
VALUES
(1, 100),
(2, 200),
(3, 300),
(4, 400),
(5, 500),
(6, 600),
(7, 700),
(8, 800),
(9, 900),
(10, 1000);

select * from my_delta_table_1;
6	600
7	700
3	300
4	400
5	500
6	600
7	700
8	800
9	900
10	1000
3	300
4	400
5	500
8	800
9	900
10	1000
1	100
2	200
1	100
2	200
Time taken: 1.175 seconds, Fetched 20 row(s)

Does this PR introduce any user-facing changes?

Yes, users will need to specify the SDK V2 credential provider instead of SDK V1 for delta storage configurations
Ex: io.delta.storage.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider -> software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant