cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks service principal token federation on Kubernetes

sparkplug
New Contributor III

Hi 

I am trying to create a service principal federation policy against AKS cluster. But I am struggling to make it work without any examples. 

It would be great if you could share examples on how this would work for a service account.
Additionally, what I would like to know is, does databricks inject tokens into the pod, or does it need the Azure workload identity federated token to communicate to databricks? In the latter case, how is it different from oauth m2m flow.

1 ACCEPTED SOLUTION

Accepted Solutions

sarahbhord
Databricks Employee
Databricks Employee

Hey sparkplug - Thanks for reaching out! 

To enable a service account in AKS to authenticate to Databricks using workload identity federation, you must create a service principal federation policy in Databricks that allows tokens issued by the Kubernetes cluster acting as the OIDC provider.

Federation Policy Example (for Kubernetes Service Account):

Key parameters:

Sample Policy (using Databricks CLI):

databricks account service-principal-federation-policy create <SERVICE_PRINCIPAL_NUMERIC_ID> --json \
'{
  "oidc_policy": {
    "issuer": "https://kubernetes.default.svc",
    "audiences": ["https://kubernetes.default.svc"],
    "subject": "system:serviceaccount:my-namespace:my-serviceaccount"
  }
}'

Corresponding JWT Claims:

{
  "iss": "https://kubernetes.default.svc",
  "aud": ["https://kubernetes.default.svc"],
  "sub": "system:serviceaccount:my-namespace:my-serviceaccount"
}

Terraform Example (resource arguments): https://github.com/databricks/terraform-provider-databricks/blob/main/docs/resources/service_princip...

  • issuer (string): OIDC issuer URL
  • audiences (list): List of valid audience values
  • subject (string): Required Kubernetes service account identifier
  • subject_claim (optional): Defaults to sub

Databricks Integration: Token Handling in AKS Pods

Databricks does not directly inject Databricks tokens into pods in AKS clusters.

How does the authentication work?

  • With Azure Workload Identity Federation, the Kubernetes pod receives a short-lived, projected service account token.
  • This token is used by the pod to exchange with Azure Entra ID (previously Azure AD) for an access token, which can then be used to access Databricks (or any other Azure resource).
  • The workflow for a pod:
    • Receives a service account token via a projected volume.
    • The application in the pod uses this token (following the federation policy described above) to exchange for a Databricks OAuth token.
    • The Databricks CLI/SDK (or your code) handles the token exchange process using environment variables or token file paths provided by the Azure Workload Identity webhook.
      3 sources

YAML Pod Configuration Example (key extracts):

spec:
  serviceAccountName: my-serviceaccount
  volumes:
    - name: azure-projected-service-account-token
      projected:
        sources:
          - serviceAccountToken:
              audience: api://AzureADTokenExchange
              expirationSeconds: 3600
              path: azure-projected-service-account-token
  containers:
    - name: my-container
      volumeMounts:
        - mountPath: /databricks/secrets/azure-projected-service-account-token
          name: azure-projected-service-account-token
          readOnly: true
      env:
        - name: AZURE_CLIENT_ID
          value: <client-id>
        - name: AZURE_TENANT_ID
          value: <tenant>
        - name: AZURE_AUTHORITY_HOST
          value: "https://login.microsoftonline.com/"
        - name: AZURE_FEDERATED_TOKEN_FILE
          value: "/databricks/secrets/azure-projected-service-account-token/azure-projected-service-account-token"
  • The Databricks SDK/CLI will look for AZURE_FEDERATED_TOKEN_FILE, perform the token exchange, and obtain a Databricks OAuth token.

Databricks Token Federation vs. OAuth m2m Flow

Workload Identity Federation (What you configure here)

  • The workload (your application in a pod) obtains a federated (OIDC) token (from Kubernetes/Azure), then exchanges it for a short-lived Databricks OAuth token via the Databricks federation endpoint.
  • No long-lived client secrets or tokens are managed or injected by Databricks.
  • This model is more secure and easier to operate at scale (automatic rotation, least privilege, reduced secret sprawl). https://docs.databricks.com/aws/en/dev-tools/auth/oauth-federation

OAuth m2m (Machine-to-Machine) Flow

  • Typically, the service (machine) uses a client_id and client_secret pre-shared with Databricks to obtain tokens.
  • Secrets must be managed, rotated, and injected into workloads (increasing risk if leaked).
  • Not as dynamic or tightly scoped as workload identity federation.

View solution in original post

5 REPLIES 5

sarahbhord
Databricks Employee
Databricks Employee

Hey sparkplug - Thanks for reaching out! 

To enable a service account in AKS to authenticate to Databricks using workload identity federation, you must create a service principal federation policy in Databricks that allows tokens issued by the Kubernetes cluster acting as the OIDC provider.

Federation Policy Example (for Kubernetes Service Account):

Key parameters:

Sample Policy (using Databricks CLI):

databricks account service-principal-federation-policy create <SERVICE_PRINCIPAL_NUMERIC_ID> --json \
'{
  "oidc_policy": {
    "issuer": "https://kubernetes.default.svc",
    "audiences": ["https://kubernetes.default.svc"],
    "subject": "system:serviceaccount:my-namespace:my-serviceaccount"
  }
}'

Corresponding JWT Claims:

{
  "iss": "https://kubernetes.default.svc",
  "aud": ["https://kubernetes.default.svc"],
  "sub": "system:serviceaccount:my-namespace:my-serviceaccount"
}

Terraform Example (resource arguments): https://github.com/databricks/terraform-provider-databricks/blob/main/docs/resources/service_princip...

  • issuer (string): OIDC issuer URL
  • audiences (list): List of valid audience values
  • subject (string): Required Kubernetes service account identifier
  • subject_claim (optional): Defaults to sub

Databricks Integration: Token Handling in AKS Pods

Databricks does not directly inject Databricks tokens into pods in AKS clusters.

How does the authentication work?

  • With Azure Workload Identity Federation, the Kubernetes pod receives a short-lived, projected service account token.
  • This token is used by the pod to exchange with Azure Entra ID (previously Azure AD) for an access token, which can then be used to access Databricks (or any other Azure resource).
  • The workflow for a pod:
    • Receives a service account token via a projected volume.
    • The application in the pod uses this token (following the federation policy described above) to exchange for a Databricks OAuth token.
    • The Databricks CLI/SDK (or your code) handles the token exchange process using environment variables or token file paths provided by the Azure Workload Identity webhook.
      3 sources

YAML Pod Configuration Example (key extracts):

spec:
  serviceAccountName: my-serviceaccount
  volumes:
    - name: azure-projected-service-account-token
      projected:
        sources:
          - serviceAccountToken:
              audience: api://AzureADTokenExchange
              expirationSeconds: 3600
              path: azure-projected-service-account-token
  containers:
    - name: my-container
      volumeMounts:
        - mountPath: /databricks/secrets/azure-projected-service-account-token
          name: azure-projected-service-account-token
          readOnly: true
      env:
        - name: AZURE_CLIENT_ID
          value: <client-id>
        - name: AZURE_TENANT_ID
          value: <tenant>
        - name: AZURE_AUTHORITY_HOST
          value: "https://login.microsoftonline.com/"
        - name: AZURE_FEDERATED_TOKEN_FILE
          value: "/databricks/secrets/azure-projected-service-account-token/azure-projected-service-account-token"
  • The Databricks SDK/CLI will look for AZURE_FEDERATED_TOKEN_FILE, perform the token exchange, and obtain a Databricks OAuth token.

Databricks Token Federation vs. OAuth m2m Flow

Workload Identity Federation (What you configure here)

  • The workload (your application in a pod) obtains a federated (OIDC) token (from Kubernetes/Azure), then exchanges it for a short-lived Databricks OAuth token via the Databricks federation endpoint.
  • No long-lived client secrets or tokens are managed or injected by Databricks.
  • This model is more secure and easier to operate at scale (automatic rotation, least privilege, reduced secret sprawl). https://docs.databricks.com/aws/en/dev-tools/auth/oauth-federation

OAuth m2m (Machine-to-Machine) Flow

  • Typically, the service (machine) uses a client_id and client_secret pre-shared with Databricks to obtain tokens.
  • Secrets must be managed, rotated, and injected into workloads (increasing risk if leaked).
  • Not as dynamic or tightly scoped as workload identity federation.

Advika
Databricks Employee
Databricks Employee

Hi @sparkplug, just checking in, did the guidance shared above help address your query? If it did, please consider marking it as the accepted answer.

sparkplug
New Contributor III

@sarahbhord Isn't it the same federation policy that I would configure on Azure as a service principal as well ? Given that I will always use Azure service principal or managed identity.

I am not able to make the databricks-sdk pick up on the environment variables , but it constantly complaints that there is no default auth set

ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

sparkplug
New Contributor III

I am currently using a two step process, logging in using azure library and then getting an access token from Azure using the databricks scope. And then using that to authorize towards Databricks. I would like to use `env-oidc` auth type instead, but it doesn't seem to work. Not sure if it's trying to write .databrickscfg . And am not sure how the token refresh is handled in the sdk


from databricks.sdk import WorkspaceClient
from azure.identity import WorkloadIdentityCredential

credential = WorkloadIdentityCredential(client_id=os.environ["APPLICATION_ID"])
token = credential.get_token(DATABRICKS_SCOPE).token
return WorkspaceClient(host=os.environ["DATABRICKS_HOST"], token=token)

I am using
databricks-sdk==0.67.0
azure-identity==1.25.0


Any suggestions are appreciated.

sarahbhord
Databricks Employee
Databricks Employee

Gotcha. Can you describe what is not working? Any specific error or behaviour?