cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

[Azure Databricks]: Use managed identity to access mlflow models and artifacts

quad_t
New Contributor

Hello! I am new to Azure Databricks and have a question: In my current setup, I am running some containerized python code within an azure functions app. In this code, I need to download some models and artifacts stored via mlflow in our Azure Databricks workspace.

Previously, I have done this by setting `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables and then within my code I just set `mlflow.set_tracking_uri("databricks")` and all worked fine. However, the token is a PAT, which I do not like from a security perspective. Ideally, I would like to use the managed Identity of the functions app to authenticate with databricks. According to the following article, this should be possible: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi-auth

So I essentially repeated the steps in the article. Note that I omitted all account-level authorization steps, since workspace-level authorization is enough for my use case. 

- I created a user-assigned managed Identity in Azure
- I assigned the managed identity to the functions app
- I added a new entra ID managed service principal in my Azure Databricks workspace, using the client ID of the managed identity as application Id
-
I created the respective config file `~/.databrickscfg`, adding a single profile with the name `[AZURE_MI_WORKSPACE]`, containing the parameters `host` (my azure databricks workspace URL), `azure_workspace_resource_id` (resource ID of my azure databricks workspace), `azure_client_id` (the client ID of the managed Identity), `azure_tenant_id` (my azure tenant ID) and I set `azure_use_msi` to `true`, just as in the config in the referenced article above

Then, I changed my code to `mlflow.set_tracking_uri("databricks://AZURE_MI_WORKSPACE")`. The code proceeds to read the information from the `.databrickscfg` file, since I get the output

loading AZURE_MI_WORKSPACE profile from ~/.databrickscfg: host, azure_workspace_resource_id, azure_client_id, azure_use_msi, azure_tenant_id

But when setting the tracking uri, I get the following error:

Reading Databricks credential configuration failed with MLflow tracking URI 'databricks://AZURE_MI_WORKSPACE'. Please ensure that the 'databricks-sdk' PyPI library is installed, the tracking URI is set correctly, and Databricks authentication is properly configured. The tracking URI can be either 'databricks' (using 'DEFAULT' authentication profile) or 'databricks://{profile}'. You can configure Databricks authentication in several ways, for example by specifying environment variables (e.g. DATABRICKS_HOST + DATABRICKS_TOKEN) or logging in using 'databricks auth login'.

Do you have any leads what could be wrong here? I triple checked the parameters in the config files and they are definitely correct. I was asking myself if I made some kind of conceptual error and the mlflow tracking can't be done via managed identity auth for some reason.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now