cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to authenticate databricks provider in terraform using a system-managed identity?

felix_counter
New Contributor III
Hello,
I want to authenticate the databricks provider using a system-managed identity in Azure. The identity resides in a different subscription than the databricks workspace:
 
managed identity.png
According to the "authentication" section of the databricks provider documentation, I performed the following steps:

 

  1. Grant the (system-assigned) managed identity the "Contributor" role on Subscription B. I can confirm via Azure portal that the app service behind the managed identity indeed has the "Contributor" role on the subscription in which the databricks workspace resides.
  2. Register the managed identity as a databricks service principal in the databricks workspace using its application id.
  3. Initialize the databricks provider with the following arguments:
    • host: host address of the databricks workspace
    • azure_workspace_resource_id: resource ID of azure workspace, obtained from an "azurerm_databricks_workspace" data object
    • azure_client_id: application id of system-managed identity / registered databricks service principal. 
    • azure_use_msi: true
I tried to create a resource using this provider.
The terraform plan step looks good, i.e. the resource I want to create shows up in the planning step. However, during the apply step I encounter the following error:
 
 Error: cannot create [redacted]: inner token: token error: ***"error":"invalid_request","error_description":"Identity not found"***
 
This error appears independent of the created resource (I tried several ones). The problems seems to be in the authentication with the managed identity.
 
Is it possible to authenticate the databricks provider using a system-managed identity? If yes, what would be the correct configuration for the provider and the environment in this setup? I am a bit confused on how to point the provider at the right identity to use. In order to point the provider to the correct identity / SPN, I set the parameter "azure_client_id" to the managed identity's application id. However, I am not sure whether this is correct.
 
4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @felix_counter, Let’s explore how to authenticate the Databricks provider using a system-managed identity in Azure. System-managed identities (formerly known as Managed Service Identities or MSIs) provide a secure way to authenticate applications and services without managing explicit credentials.

 

Here are the steps to set up Azure managed identities authentication for Databricks:

 

Create a User-Assigned Managed Identity:

  • First, create a user-assigned managed identity in your Azure subscription. This identity will represent your Databricks service principal.
  • You can do this via the Azure Portal or using Terraform. Make sure to note down the client ID of this managed identity.

Assign the Managed Identity to Your Databricks Account and Workspace:

  • Assign the user-assigned managed identity to both your Azure Databricks account and the specific workspace within that account.
  • This step ensures that the managed identity has the necessary permissions to interact with Databricks resources.

Configure an Azure Virtual Machine (VM):

  • You’ll need an Azure VM (or another resource that supports managed identities) to programmatically call Azure Databricks operations.
  • Assign the user-assigned managed identity to this VM.

Install and Configure the Databricks CLI on the Azure VM:

  • Install the Databricks Command Line Interface (CLI) on your Azure VM.
  • Configure the Databricks CLI to use Azure managed identities authentication for Databricks by specifying the assigned managed identity.

Run Commands with the Databricks CLI:

  • Use the Databricks CLI to automate your Azure Databricks account and workspace operations.
  • The CLI will automatically use the managed identity to obtain Azure Active Directory tokens for authentication.

Here’s how you can set up the managed identity for your existing Web App:

  1. In the Azure Portal, open your Web App.
  2. Click on “Identity” and enable “System-assigned managed identity.”
  3. Copy the generated “Object (principal) ID” and search for the associated Enterprise Application in Azure Active Directory.

Remember that managed identities for Azure resources are different from Microsoft Entra ID (formerly Azure Active Directory) service principals. Databricks supports both types of authentication, so choose the one that best fits your use case.

By following these steps, you should be able to authenticate the Databricks provider using a system-managed identity. If you encounter any issues, double-check the configuration and ensure that the managed identity has the necessary permissions in both the Databricks workspace and the Azure subscription where it resides. 

 

For more detailed information, refer to the official Azure Databricks documentation on managed identities authentication.

felix_counter
New Contributor III

Dear @Kaniz,

thanks a lot for your response describing the step-by-step guide to authenticate Databricks using a managed identity.

However, to my best understanding this is not what I want to achieve. To recap, my goal is to use a system-assigned (i.e., not a user-assigned) managed identity of a web app to authenticate with the terraform databricks provider (i.e., not the CLI). I would be very grateful if you could provide a similar step-by-step guide for this setup.

felix_counter
New Contributor III

I furthermore also tried to authenticate using a user-assigned managed identity. In detail, I performed the following steps using Terraform:

  1. Create a user-assigned managed identity in the same resource group as the databricks workspace
  2. Create a databricks service principal setting 'application_id' to the client id of the managed identity. 
  3. Assign the managed identity the "Contributor" role on the subscription in which the databricks workspace is located.
  4. Declare a databricks provider setting 'azure_use_msi' to true, 'host' to the databricks workspace url, 'azure_workspace_resource_id' to the resource id of the databricks workspace, and  'azure_client_id' to the application id of the managed identity.
  5. Create a databricks token using said provider

The same error ("Identity not found") occurs during the terraform apply of step 5 (token creation). I also tried creating other resources, they all fail with above-stated error message. @alexott, do you have a suggestion?

Thanks a lot for your support! 

FarBo
New Contributor III

@felix_counter 

I think I have your answer.

To create a databricks provider to manage your workspace using an SPN, you need to create the provider like this:

provider "databricks" {
  alias      = "workspace"
  host       = <your workspace URL>
  azure_client_id = <Application ID of the SPN>
  azure_client_secret = <Application secret of the SPN>
  azure_tenant_id = <Your Azure subscription tenant ID>
}

I store all these credentials as secrets in my Azure KeyVault and call the keyvault to have access to all its secrets. Then I define data fields to retrieve the secret values from my KeyVault and pass them in the databricks provider definition. You probably know you need to use azurerm provider for this. Below is the full block:

data "azurerm_key_vault" "key_vault" {
  name                = <your keyvault_name>
  resource_group_name = <your rg_name>
}

data "azurerm_key_vault_secret" "workspace_url" {
  name                = "<Workspace-URL>"
  key_vault_id = data.azurerm_key_vault.key_vault.id
}

data "azurerm_key_vault_secret" "workspace_admin_spn_app_id" {
  name                = "<Workspace-ADMINSPN-APPLICATIONID>"
  key_vault_id = data.azurerm_key_vault.key_vault.id
}

data "azurerm_key_vault_secret" "workspace_admin_spn_app_secret" {
  name                = "<Workspace-ADMINSPN-APPLICATIONSECRET>"
  key_vault_id = data.azurerm_key_vault.key_vault.id
}

data "azurerm_key_vault_secret" "tenant_id" {
  name                = "<AZURE-TENANTID>"
  key_vault_id = data.azurerm_key_vault.key_vault.id
}

provider "databricks" {
  alias      = "workspace"
  host       = data.azurerm_key_vault_secret.workspace_url.value
  azure_client_id = data.azurerm_key_vault_secret.workspace_admin_spn_app_id.value
  azure_client_secret = data.azurerm_key_vault_secret.workspace_admin_spn_app_secret.value
  azure_tenant_id = data.azurerm_key_vault_secret.tenant_id.value
}

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.