Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add load_delta support with deltalake package #11960

Open
1 of 22 tasks
ion-elgreco opened this issue May 9, 2024 · 2 comments
Open
1 of 22 tasks

Add load_delta support with deltalake package #11960

ion-elgreco opened this issue May 9, 2024 · 2 comments
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request help wanted We would like help from the community to add this support

Comments

@ion-elgreco
Copy link

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

Load_delta should be added using the deltalake library. I am happy to contribute this myself since I work on delta-rs project.

Load delta should return Polars or pandas datasets, and Polars should be added as optional dependency.

Motivation

What is the use case for this feature?

Spark-delta is way to heavy for most usecases, hence the popularity of Python deltalake.

Why is this use case valuable to support for MLflow users in general?

Support for single node ML usecases with Polars pipelines and deltalake as storage layer.

Why is this use case valuable to support for your project(s) or organization?

Would prevent going through pandas to interact with mlflow, improve tracking of deltalake table inputs.

Why is it currently difficult to achieve this use case?

Details

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@ion-elgreco ion-elgreco added the enhancement New feature or request label May 9, 2024
@github-actions github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label May 9, 2024
@ion-elgreco
Copy link
Author

I can add load_polars and load_pandas on classmlflow.data.delta_dataset_source.DeltaDatasetSource

@harupy harupy added the help wanted We would like help from the community to add this support label May 13, 2024
Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request help wanted We would like help from the community to add this support
Projects
None yet
Development

No branches or pull requests

2 participants