New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Databricks Operators to DBFS Interaction #39262
Comments
Maybe it is also good idea to implements DBFS over Object Storage |
And just wondering why not implement it over the official SDK? Note about production usage from https://docs.databricks.com/en/dev-tools/sdk-python.html Note This feature is in Beta and is okay to use in production. During the Beta period, Databricks recommends that you pin a dependency on the specific minor version of the Databricks SDK for Python that your code depends on. |
Hi @Taragolis . Thx for your reply! TBH, I wasn't aware of the existence of Object Storage. It seems as if many of the things I've implemented were already there. The only thing I cannot find is some sort of With respect to the SDK, sounds good to me. However, the whole plugin is done pointing directly to the REST Endpoints. I think it may be better in that sense to stick to one strategy (either change everything to point to the SDK or extend it using the REST API) |
AIrflow ObjectStorage build in top of the
Small nit, this one about Airflow Provider, not a Airflow Plugin that is a bit different things.
In the long run SDK should replace internal solutions, that is why I propose to use SDK over the direct call to the API |
Absolutely agree on the idea! I think that's a quite deep change though and I am not sure how that's handled and if it shouldn't be actually part of another ticket (i.e., more of a refactor ticket than a feature add one) |
@Taragolis @eladkal should I move forward with this as originally posted or do you have sth different in mind? |
Description
Create operators and Hook to interact with Databricks' DBFS (https://docs.databricks.com/api/workspace/dbfs)
Use case/motivation
As per latest databricks plugin (https://github.com/apache/airflow/tree/main/airflow/providers/databricks) there is no possibility to interact with DBFS API.
As I had to do it in my job (and I have it quite developed), I thought it'd be a good idea to share it with the community
So far, I've got:
BaseDatabricksHook
As part of the PR, I'd add:
Please LMK if you consider this a relevant contribution or not
Related issues
As one of the DBFS API endpoints uses
PUT
as verb., I'd need to include a modification inBaseDatabricksHook
, because it is not supporting PUT ATM (see https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks_base.py#L584)Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: