Thanks for your reply @nicole_lu_PM,

We use Azure DevOps and Microsoft Entra ID service principals for automation. In our CI/CD we're able to use service principal authentication via the Azure CLI to interact with Databricks workspaces via the Databricks-SDK/CLI and we have no pain points with this process.

If I understand the documentation and blog post correctly, the authentication there refers to the connection Databricks Git folder <-> Git provider via a Git credential for an Entra ID SP. We currently do this essentially as described in those documents, with one difference: you don't really need to create an additional client secret for the Entra ID SP in the Databricks account because the Entra ID client secret generated in Step 1 of the documentation suffices (both the SDK and CLI can pick the authentication context of az login and request the Entra ID tokens in the background).

Our pain point and where I believe Databricks' platform ergonomics is just still unripe is in the connection Job running as service principal <-> Git provider when you use version-controlled source code in a Databricks job.

  • If you trigger job runs via API request, you can add a step before to fetch a fresh Entra ID token and update the SPs Git credential with it just in time for the job run (which will check the code from the upstream repo).
  • But what if you'd like to let Databricks orchestrate your job runs by triggering them on a schedule? There's no way to manually add a token refresh before job run there and even if it was possible, the real solution would be for Databricks to support machine to machine OAuth from Databricks to Git providers (I'm well aware that m2m is supported towards Databricks).

Identity providers do support such use-case as well (Microsoft Entra ID with Workload Identity Federation, GitHub with OAuth Apps) and I'll be happy when Databricks supports it and we as platform engineers can get rid of the workarounds we need to make up for it.