cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

[Azure Databricks]: Use managed identity to access mlflow models and artifacts

quad_t
New Contributor II

Hello! I am new to Azure Databricks and have a question: In my current setup, I am running some containerized python code within an azure functions app. In this code, I need to download some models and artifacts stored via mlflow in our Azure Databricks workspace.

Previously, I have done this by setting `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables and then within my code I just set `mlflow.set_tracking_uri("databricks")` and all worked fine. However, the token is a PAT, which I do not like from a security perspective. Ideally, I would like to use the managed Identity of the functions app to authenticate with databricks. According to the following article, this should be possible: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi-auth

So I essentially repeated the steps in the article. Note that I omitted all account-level authorization steps, since workspace-level authorization is enough for my use case. 

- I created a user-assigned managed Identity in Azure
- I assigned the managed identity to the functions app
- I added a new entra ID managed service principal in my Azure Databricks workspace, using the client ID of the managed identity as application Id
-
I created the respective config file `~/.databrickscfg`, adding a single profile with the name `[AZURE_MI_WORKSPACE]`, containing the parameters `host` (my azure databricks workspace URL), `azure_workspace_resource_id` (resource ID of my azure databricks workspace), `azure_client_id` (the client ID of the managed Identity), `azure_tenant_id` (my azure tenant ID) and I set `azure_use_msi` to `true`, just as in the config in the referenced article above

Then, I changed my code to `mlflow.set_tracking_uri("databricks://AZURE_MI_WORKSPACE")`. The code proceeds to read the information from the `.databrickscfg` file, since I get the output

loading AZURE_MI_WORKSPACE profile from ~/.databrickscfg: host, azure_workspace_resource_id, azure_client_id, azure_use_msi, azure_tenant_id

But when setting the tracking uri, I get the following error:

Reading Databricks credential configuration failed with MLflow tracking URI 'databricks://AZURE_MI_WORKSPACE'. Please ensure that the 'databricks-sdk' PyPI library is installed, the tracking URI is set correctly, and Databricks authentication is properly configured. The tracking URI can be either 'databricks' (using 'DEFAULT' authentication profile) or 'databricks://{profile}'. You can configure Databricks authentication in several ways, for example by specifying environment variables (e.g. DATABRICKS_HOST + DATABRICKS_TOKEN) or logging in using 'databricks auth login'.

Do you have any leads what could be wrong here? I triple checked the parameters in the config files and they are definitely correct. I was asking myself if I made some kind of conceptual error and the mlflow tracking can't be done via managed identity auth for some reason.

2 REPLIES 2

ali_daei
New Contributor II

Hi @quad_t, were you able to find a solution to this problem? I'm having similar issues when trying to use MSI to connect to MLflow.

quad_t
New Contributor II

Hi @ali_daei 

Yes, indeed! I discussed this in a Microsoft Q and A forum and got an answer that works. Check the answer here: https://learn.microsoft.com/en-us/answers/questions/2276345/use-managed-identity-to-access-mlflow-mo...

In short: Do NOT use client_id, tenant_id etc. but stick to the usual DATABRICKS_HOST and DATABRICKS_TOKEN environment variable approach. For the token, you need to generate it for the Managed Identity you want to access your workspace with. It can be done with ManagedIdentityCredential of the azure.identity package if you are using the python SDK (see code snippet in the accepted answer in the microsoft forum link).

One thing that confused me at first was the Azure Databricks resource App ID that you need to use to generate the token. It looks like some custom UUID, but it  is apparently a commonly known STATIC id that is the same for all azure databricks resources. So when generating the token, alsways use

token = credential.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default")

The id 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d ALWAYS is the same for any azure databricks resource. Again, check out the above link for a more detailed discussion.

I hope this helps!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now