Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute cluster[Shared] for Service Principle to execute ML related workflows from GitHub Actions #140

Open
puviarasu17 opened this issue Jan 20, 2024 · 9 comments
Assignees

Comments

@puviarasu17
Copy link

puviarasu17 commented Jan 20, 2024

We are using the below cluster configuration in our template project created from mlops-stacks with Feature store and Unity Catalog options enabled. When we run, we are getting the below exception in feature-engineering-workflow-asset.yml when Feature store is trying to create table in Unity catalog.

Note: We have the expected 'test' catalog in our metastore and the service principal has the right access.

Cluster Configuration in template project created from mlops-stacks:

new_cluster: &new_cluster
  new_cluster:
    num_workers: 1
    spark_version: 13.3.x-gpu-ml-scala2.12
    node_type_id: g2-standard-4
    custom_tags:
      clusterSource: mlops-stack

Exception in GitHub actions:
ValueError: Catalog 'test' does not exist in the metastore.

For an exploration of this issue, we tried the same notebook in an all-purpose cluster with shared access. We get the same exception. Also, we are getting the below exception when we try the sql query: SELECT CURRENT_METASTORE();

Exception in notebook attached to all-purpose shared cluster for the above sql:
AnalysisException: [OPERATION_REQUIRES_UNITY_CATALOG] Operation CURRENT_METASTORE requires Unity Catalog enabled.

Setting the spark config spark.databricks.unityCatalog.enabled to true is not working.

Can you please suggest the correct compute config we should be using for mlops-stacks with unity catalog and feature store enabled?

@vladimirk-db
Copy link
Contributor

Just to confirm - is your workspace UC enabled?

Also, are you specifying a UC-supported data access mode in your cluster config? See this for more info (also the cluster create API reference).

Hope this helps.

@puviarasu17
Copy link
Author

Hello @vladimirk-db, thank you for the response.

Our Databricks workspace is UC enabled. Please refer the screen captures below.
Screenshot 2024-01-23 at 3 07 04 PM
Screenshot 2024-01-23 at 3 08 40 PM

Our problem is that when we create a template project from MLOps-stacks with Feature Store and Model Registry with Unity catalog option enabled as below, our compute clusters are not able to access Unity Catalog when Shared mode[Needed for Github to run Databricks jobs with Service Principal token] is enabled.
Screenshot 2024-01-19 at 11 05 16 AM copy

From Databricks documentation for Unity Catalog limitations we got to know that Unity Catalog will not be enabled in shared access mode for ML clusters.

We manually tried creating compute clusters with Shared access mode and ML. We are getting the below error.
Databricks Runtime for Machine Learning does not support User Isolation security mode. Use Single User security mode if you need to access Unity Catalog.
Screenshot 2024-01-23 at 4 18 21 PM

Does this mean MLOPs stacks template project wont be working for Feature Store and Unity Catalog options? Or Do you suggest a compute config for it to work?

CC: @rajeshvinayagam-lab

@arpitjasa-db
Copy link
Collaborator

arpitjasa-db commented Jan 24, 2024

@puviarasu17 if you create the jobs with single user access mode, but the single user is the Service Principal, does that work?

@puviarasu17
Copy link
Author

Hello @arpitjasa-db,

We have tried that and we could not create single user cluster for Service Principals, even though the Service Principal is having "Can Manage" permission to the cluster like any other user.

Below is screen capture of the Cluster Permissions. Here, Rajesh Vinayagam is an user and mlops-demo is a Service Principal.
Screenshot 2024-01-24 at 9 42 48 AM

When we search for the user assignment to the Single User cluster, we could find the User[Rajesh Vinayagam] and Assign. Below is the screen capture of that.
Screenshot 2024-01-24 at 9 43 20 AM

But we could not get any results to when we search the Service Principal by it name or by its UUID. Belew 2 screen captures are for that.
Screenshot 2024-01-24 at 9 43 45 AM
Screenshot 2024-01-24 at 9 44 49 AM

Because of this, we are blocked. It would be helpful if you could provide us a Cluster config for this scenario.

Thank you.
CC: @rajeshvinayagam-lab

@arpitjasa-db
Copy link
Collaborator

Hmm yeah I see it seems that the only supported access modes for UC are Single User and Shared, and Shared does not support MLR. Now it seems like Single User does not actually support Service Principals. With this, I can think of two workarounds:

  1. Use the Shared access mode with the latest DBR (instead of MLR), but install the latest mlflow as a library into the cluster.
  2. Use a non-UC cluster. MLOps Stacks can be used with a non-UC cluster and you will still have access to Models in UC.

@puviarasu17
Copy link
Author

Hello @arpitjasa-db,

Thank you for the suggestions.

If we are going with option 1, we are not able to attach GPU enabled workers and drivers which would be needed for model training.
Screenshot 2024-01-24 at 4 09 11 PM

Please suggest whether any workarounds possible for that. Thank you!

CC: @rajeshvinayagam-lab

@arpitjasa-db
Copy link
Collaborator

Hi @puviarasu17 did some more digging and it looks like it is possible in Single User Mode to assign to a Service Principal, but you need to sign up for the Preview for that feature

@puviarasu17
Copy link
Author

Hi @arpitjasa-db, Thank you for the information. It would be helpful if you could provide the link or procedure to sign up for the Preview.
Thank you.
CC: @rajeshvinayagam-lab

@arpitjasa-db
Copy link
Collaborator

Hi @puviarasu17 you can reach out your Databricks representative and mention that you want to register for the service principal clusters preview, and they should be able to walk you through the steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants