Hello! I am new to Azure Databricks and have a question: In my current setup, I am running some containerized python code within an azure functions app. In this code, I need to download some models and artifacts stored via mlflow in our Azure Databricks workspace.
Previously, I have done this by setting `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables and then within my code I just set `mlflow.set_tracking_uri("databricks")` and all worked fine. However, the token is a PAT, which I do not like from a security perspective. Ideally, I would like to use the managed Identity of the functions app to authenticate with databricks. According to the following article, this should be possible: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/azure-mi-auth
So I essentially repeated the steps in the article. Note that I omitted all account-level authorization steps, since workspace-level authorization is enough for my use case.
- I created a user-assigned managed Identity in Azure
- I assigned the managed identity to the functions app
- I added a new entra ID managed service principal in my Azure Databricks workspace, using the client ID of the managed identity as application Id
- I created the respective config file `~/.databrickscfg`, adding a single profile with the name `[AZURE_MI_WORKSPACE]`, containing the parameters `host` (my azure databricks workspace URL), `azure_workspace_resource_id` (resource ID of my azure databricks workspace), `azure_client_id` (the client ID of the managed Identity), `azure_tenant_id` (my azure tenant ID) and I set `azure_use_msi` to `true`, just as in the config in the referenced article above
Then, I changed my code to `mlflow.set_tracking_uri("databricks://AZURE_MI_WORKSPACE")`. The code proceeds to read the information from the `.databrickscfg` file, since I get the output
loading AZURE_MI_WORKSPACE profile from ~/.databrickscfg: host, azure_workspace_resource_id, azure_client_id, azure_use_msi, azure_tenant_id
But when setting the tracking uri, I get the following error:
Reading Databricks credential configuration failed with MLflow tracking URI 'databricks://AZURE_MI_WORKSPACE'. Please ensure that the 'databricks-sdk' PyPI library is installed, the tracking URI is set correctly, and Databricks authentication is properly configured. The tracking URI can be either 'databricks' (using 'DEFAULT' authentication profile) or 'databricks://{profile}'. You can configure Databricks authentication in several ways, for example by specifying environment variables (e.g. DATABRICKS_HOST + DATABRICKS_TOKEN) or logging in using 'databricks auth login'.
Do you have any leads what could be wrong here? I triple checked the parameters in the config files and they are definitely correct. I was asking myself if I made some kind of conceptual error and the mlflow tracking can't be done via managed identity auth for some reason.