07-08-2025 07:24 AM
In our Dataplatform, our jobs are defined in a dataplatform_jobs.yml within a Databricks Asset Bundle, and then pushed to Databricks via an Azure Devops Pipeline (Azure Devops is where our codebase resides). Currently, this results in workflows looking like this, where they're created by the Dataplatform Service Principal, but are run as the username of a specific colleague:
We'd like to change this, so "Run as" is also the Service Principal. This will lead to easier maintenance, and we won't have trouble if this colleague leaves the team for example. However, our workflows are connected to our Devops repo, and run on the latest version of our dev/test/acc/prd branch. As a user this runs fine, as the PAT of that specific user is used for authentication. If we change it to the sp-dataplatform, we run into authentication issues.
We could add a PAT for sp-dataplatform manually, but then this is still tied to a specific user account. This doesn't really solve the issue.
We also tried the Azure DevOps Services (Azure Active Directory) option for Git integration within the service principal, but I believe this is only used to pull Databricks repos to Devops, instead of the other way around?
There are a lot of links and threads related to this, such as:
- https://community.databricks.com/t5/data-engineering/use-azure-service-principal-to-access-azure-dev...
- https://learn.microsoft.com/en-us/azure/databricks/repos/automate-with-ms-entra
- https://learn.microsoft.com/en-us/azure/databricks/jobs/how-to/run-jobs-with-service-principals
- https://community.databricks.com/t5/data-engineering/run-task-as-service-principal-with-code-in-azur...
I've experimented with these options as mentioned, but I think they all serve a slightly different use case. Some colleagues who worked on different projects also didn't have a 100% satisfactory solution for this. Are we missing something; is there a way in which we can configure this to work?
Thanks in advance!
07-09-2025 03:06 AM
Hello @LuukDSL ,
Could you share a snippet of your CI/CD YAML file so we can give more specific advice?
I’ve connected Azure DevOps to Databricks using ARM credentials, and it was set the pipeline’s "Run as" user to a service principal, no extra steps were required.
Once you share the snippet, we can suggest the next steps.
Best, Ilir
07-09-2025 06:33 AM
Hi @ilir_nuredini,
Thanks for your response. Does your pipeline also run on a git source of your repository?
In our CD Pipeline.yml for Devops (in which we use Terraform) we have this stage for the asset bundles:
- stage: DeployDatabricksWorkflows
displayName: "Deploy Databricks Workflows with Asset Bundles"
condition: not(failed('ApplyTerraformEnv'))
variables:
workspace_url: $[ stageDependencies.OutputTerraformEnv.outputTerraformJob.outputs['readOutputTerraformTask.workspace_url'] ]
serverless_warehouse_id: $[ stageDependencies.OutputTerraformEnv.outputTerraformJob.outputs['readOutputTerraformTask.serverless_warehouse_id'] ]
jobs:
- job: "DeployAssetBundle"
steps:
- script: "curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh"
displayName: "Install Databricks CLI"
workingDirectory: .
- script: 'databricks bundle deploy --var="serverless_warehouse_id=$(serverless_warehouse_id)"'
displayName: "Deploy Asset Bundle"
workingDirectory: bundle
env:
DATABRICKS_HOST: $(workspace_url)
DATABRICKS_CLIENT_ID: $(client_id)
DATABRICKS_CLIENT_SECRET: $(auth_secret)
DATABRICKS_BUNDLE_ENV: $(target_branch)
In our dataplatform_jobs.yml, workflows and tasks are configured with these settings:
source: GIT
run_as: <email of colleague>
git_source:
git_url: https://our-organisation@dev.azure.com/our-organisation/Dataplatform/_git/dataplatform
git_provider: azureDevOpsServices
07-09-2025 07:59 AM
Hello @LuukDSL ,
This is how Iam connecting with Databricks:
env:
ARM_TENANT_ID: $(AZURE_SP_TENANT_ID)
ARM_CLIENT_ID: $(AZURE_SP_APPLICATION_ID)
ARM_CLIENT_SECRET: $(AZURE_SP_CLIENT_SECRET)
So Iam using SP creds to connect, and the Jobs gets assigned as owner and run as SP user.
And Yes it runs on git source of my repository.
Best, Ilir
07-16-2025 04:21 AM
Interesting! Is this also in your CD file where you use databricks bundle deploy? It looks similar to my env part, although you're using AZURE_SP_ variables. I suppose they're also used to make the connection to Databricks?
Also, did you specify somewhere how your SP can make contact with the repo, i.e. via Settings -> Identity and access (workspace admin) -> Service principals (manage), and then via the Git integration tab of your SP?
07-16-2025 05:58 AM
Hello @LuukDSL ,
That’s right, I’m using Azure SP variables to connect to Databricks.
However, the part where the SP connects to the repo happens outside of Databricks (e.g. Azure DevOps).
You don’t need to set up any Git integration using the SP, because once you push your code through DABs, it resides within Databricks, no further connection to Git is needed.
Best, Ilir
07-17-2025 06:20 AM
Hi @ilir_nuredini,
[...] because once you push your code through DABs, it resides within Databricks, no further connection to Git is needed.
I think this is what we might be doing differently. At the bottom of my first reply, I specified how the tasks in our workflows are configured via the dataplatform_jobs.yml file. Because we specify the git-source there, it leads to this configuration in the Jobs UI:
So, every workflow run, the code is dynamically pulled. I suppose you're using the DAB in another way, where the whole repo is pushed to a repo in the Databricks Workspace?
07-17-2025 06:28 AM
That is right, the whole repo (bundle file structure) is pushed to the Databricks Workspace
07-17-2025 07:42 AM
can you try generating oauth databricks token for this sp and then pass this token to your databricks bundle deploy as env variables section instead of client id and secret.
07-22-2025 07:52 AM
Thanks for your reply. We use a few different jobs, so that would mean that all these jobs would require this task, right? This seems like a large manual approach, which you expect to be able to do automatically. Do you agree with that assessment?
Also, isn't this token that's created still a PAT? Or is this different as you use an Azure App ID for the SP? (I believe we only have one App ID btw.)
07-25-2025 03:00 PM
Hi,
Even if you have multiple jobs then why your deployment procedure is not one? It should be one common deployment pipeline right? It is more like your way of working but can be standaridized.
I am sharing this from my own experience when we pass client_id and secret then it seems like asset bundles is taking identity of sp as creator only but using deployer identity as run_as.
If you use the approach I suggested then the oauth token which we have generated has the identity of sp both for creator and run_as. I would say give it a try. Thanks
07-22-2025 07:45 AM
Thanks, that explains a lot! I will experiment with that approach and see if it works well for our use cases. Is there a risk that the code is manually changed by a user? Let's say that the acc-branch is pushed via the bundle file structure when a PR from tst has been merged. Now, the whole repo (on the acc-branch) will be pushed. Can a user now change the repo in Databricks (either accidentally or deliberately), or have you found a way to keep this locked?
07-22-2025 08:21 AM
Hello @LuukDSL ,
Yes, it's definitely possible that someone accidentally changed something in the bundle folder.
To prevent this, you have a couple of options:
You can restrict access to the folder entirely, or
You can grant VIEW access only, which means users can see the contents but won’t be able to edit the files without going through the standard process.
In either case, if users have access to the jobs, they'll still be able to run them regardless of their permissions on the actual files.
07-17-2025 07:58 AM
I have provided a solution of your problem , give a try and share feedback. Thanks
08-07-2025 01:12 PM
Hi @LuukDSL had you try the solution I have provided above ?
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now