Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make filesystem destination work with gcs in s3 compatibility mode #1272

Open
sh-rp opened this issue Apr 24, 2024 · 3 comments
Open

Make filesystem destination work with gcs in s3 compatibility mode #1272

sh-rp opened this issue Apr 24, 2024 · 3 comments
Assignees
Labels
destination Issue related to new destinations enhancement New feature or request tech-debt Leftovers from previous sprint that should be fixed over time

Comments

@sh-rp
Copy link
Collaborator

sh-rp commented Apr 24, 2024

GCS has an s3 interoperability mode which emulates s3. This way users could use the s3 bucket filesystem implementation with gcs. This is useful when loading to a final destination which supports connecting to s3 buckets but not gcs, such as clickhouse. Clickhouse now has a custom implementation to handle this, but it would be nice if our filesystem could be set up to use the s3 implementation with gcs buckets, right now based on the bucket url, gcs will always be selected automatically.

If this is done, we can update clickhouse and remove the specialized implementation.

@sh-rp sh-rp mentioned this issue Apr 24, 2024
4 tasks
@rudolfix
Copy link
Collaborator

@sh-rp is this the same as #1181 ? we support any compatible s3 storage ie R2 and minio also works. what is different here? it would be good to specify what needs to be done :)

@Pipboyguy Pipboyguy added enhancement New feature or request tech-debt Leftovers from previous sprint that should be fixed over time destination Issue related to new destinations labels May 7, 2024
@Pipboyguy
Copy link
Collaborator

@sh-rp The gcs is an alias of s3 table function. We don't seem to have an option other than to use the HMAC key authentication mechanism that's currently implemented.

Would #1181 be a solution to this by extending the GcpCredentials class?

@Pipboyguy
Copy link
Collaborator

Pipboyguy commented May 20, 2024

@sh-rp @rudolfix if we remove the GcsCredentials totally from the storage auth flow like so:

        if bucket_scheme in ("s3", "gs", "gcs"):
            if not isinstance(staging_credentials, AwsCredentialsWithoutDefaults):
                raise DestinationTransientException(
                    "Staging data to S3 or Google Cloud Storage requires providing AWS-style HMAC access keys, even if using a "
                    "non-AWS storage provider. Please make sure to configure your destination with an `aws_access_key_id` and `aws_secret_access_key`. "
                    "See https://dlthub.com/docs/general-usage/destination#pass-explicit-parameters-and-a-name-to-a-destination "
                    "for documentation on how to set up your destination with the required access keys."
                )

then the current tests with gcs will fail before we have implemented named destinations in tests so we can separate s3/gcs tests. So I'll leave things as is, and only implement one specialised test for gcs hmac flow:

[destination.clickhouse.credentials]
...
gcp_access_key_id = "..."
gcp_secret_access_key = "..."

[destination.filesystem_s3_gcs_comp.credentials]
aws_access_key_id = "..."
aws_secret_access_key = "..."

Obviously this will leave a duplicate definition of the same keys in config until we get the named destinations support into tests. Please advise if this is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
destination Issue related to new destinations enhancement New feature or request tech-debt Leftovers from previous sprint that should be fixed over time
Projects
Status: In Progress
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants