Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create public fs_client interface (similar to the sql_client interface) on class Pipeline: #1282

Open
sh-rp opened this issue Apr 25, 2024 · 0 comments

Comments

@sh-rp
Copy link
Collaborator

sh-rp commented Apr 25, 2024

Our pipeline object exposes a sql_client() method that can be used to directly access the current main destination storage if the destination is one of the sql destinations. It would be cool to have the same for filesystem. The idea is to have a wrapper around the native fsspec implementation that only exposes methods that we know to work with all destinations and also some additional ones that are dlt specific. There is some work done already, see the private .fs_client() method on the timeline and the corresponding returned interface FSClientBase. We still have to define what we want to give to the user here exactly, for now there are some methods to get directories and files corresponding to a table in the schema.

Some ideas:

  • The endpoints that return paths should be able to return absolute paths (s3://... or c:\bla...) when requested.
  • The endpoints for reading text or bytes should be able to also gunzip the file from the destination if requested
  • We could support returning duckdb sql statements that can be used to create tables from a buckt or folder. In the case of buckets that would also include returning s3 credentials statements for example.
  • we could also return objects that return arrow tables for a table file or a full table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant