Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a UC uri check to flavor_backend_registry #11990

Merged
merged 16 commits into from
May 30, 2024

Conversation

kriscon-db
Copy link
Collaborator

@kriscon-db kriscon-db commented May 14, 2024

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11990/merge

Checkout with GitHub CLI

gh pr checkout 11990

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Change flavor_backend_registry.py to use the get_artifact_repository to make sure you get the specific repository for UC uri's.

How is this PR tested?

  • [x ] Existing unit/integration tests
  • New unit/integration tests
  • [ x] Manual tests

Tested on my local environment with the repro from the ES ticket. This PR solves the incorrect artifact repo bug that exists in this code path for UC.

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?
  • Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
    Bug fixes, doc updates and new features usually go into minor releases.
  • Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
    Bug fixes and doc updates usually go into patch releases.
  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

Copy link

github-actions bot commented May 14, 2024

Documentation preview for 0906436 will be available when this CircleCI job
completes successfully.

More info

Copy link

@kriscon-db Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

@kriscon-db kriscon-db requested a review from smurching May 14, 2024 01:32
@@ -38,7 +38,8 @@ def _get_flavor_backend_for_local_model(model=None, build_docker=True, **kwargs)
def get_flavor_backend(model_uri, **kwargs):
if model_uri:
with TempDir() as tmp:
if ModelsArtifactRepository.is_models_uri(model_uri):
from mlflow import get_registry_uri
if ModelsArtifactRepository.is_models_uri(model_uri) and not is_databricks_unity_catalog_uri(get_registry_uri()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT we're special casing model registry URIs here because the logic below

            local_path = _download_artifact_from_uri(
                append_to_uri_path(underlying_model_uri, MLMODEL_FILE_NAME), output_path=tmp.path()
            )

May not work for models:/ URIs (or probably at least didn't when originally written) since URIs like models:/<modelname>/<version>/path/to/file weren't supported

Instead of special-casing UC, it'd be better to just call get_artifact_repository on model_uri to get the artifact repo, then download_artifacts similar to https://github.com/mlflow/mlflow/pull/8764/files#diff-c3efddc751d03e91a2284128768d6761b80f9b03ad48947c9f701dbf743b25aaR78

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kriscon-db ! Could we test by configuring a Databricks CLI profile locally and then running import mlflow; mlflow.set_registry_uri("databricks-uc://<profile-name>"); mlflow.models.build_docker(...), passing UC model version info?

@github-actions github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs. labels May 29, 2024
@kriscon-db
Copy link
Collaborator Author

@mlflow-automation autoformat

@kriscon-db
Copy link
Collaborator Author

@mlflow-automation autoformat

@kriscon-db
Copy link
Collaborator Author

@mlflow-automation autoformat

Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
@kriscon-db kriscon-db requested a review from smurching May 29, 2024 18:03
)
root_uri, artifact_path = _get_root_uri_and_artifact_path(model_uri)
artifact_repo = get_artifact_repository(root_uri)
local_path = artifact_repo.download_artifacts(artifact_path, dst_path=tmp.path())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still be downloading the MLMODEL_FILE_NAME here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe Model.load is able to figure that out somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://src.dev.databricks.com/mlflow/mlflow@66a2a0c78b6dc5a4cd5518af86b1490459998356/-/blob/mlflow/models/model.py?L593-595

that leads me to believe the work is already being done? I can put that back in this method if you want tho.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, no need, was just curious - if tests are passing that means it works :D

Copy link
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy harupy merged commit ed6e214 into mlflow:master May 30, 2024
53 of 56 checks passed
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request May 30, 2024
Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
B-Step62 pushed a commit to B-Step62/mlflow that referenced this pull request May 30, 2024
Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
B-Step62 pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Co-authored-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors autoformat patch-2.13.1 rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants