Databricks Community

LuukDSL · ‎07-08-2025

In our Dataplatform, our jobs are defined in a dataplatform_jobs.yml within a Databricks Asset Bundle, and then pushed to Databricks via an Azure Devops Pipeline (Azure Devops is where our codebase resides). Currently, this results in workflows looking like this, where they're created by the Dataplatform Service Principal, but are run as the username of a specific colleague:

We'd like to change this, so "Run as" is also the Service Principal. This will lead to easier maintenance, and we won't have trouble if this colleague leaves the team for example. However, our workflows are connected to our Devops repo, and run on the latest version of our dev/test/acc/prd branch. As a user this runs fine, as the PAT of that specific user is used for authentication. If we change it to the sp-dataplatform, we run into authentication issues.

We could add a PAT for sp-dataplatform manually, but then this is still tied to a specific user account. This doesn't really solve the issue.

We also tried the Azure DevOps Services (Azure Active Directory) option for Git integration within the service principal, but I believe this is only used to pull Databricks repos to Devops, instead of the other way around?

There are a lot of links and threads related to this, such as:
- https://community.databricks.com/t5/data-engineering/use-azure-service-principal-to-access-azure-dev...
- https://learn.microsoft.com/en-us/azure/databricks/repos/automate-with-ms-entra
- https://learn.microsoft.com/en-us/azure/databricks/jobs/how-to/run-jobs-with-service-principals
- https://community.databricks.com/t5/data-engineering/run-task-as-service-principal-with-code-in-azur...

I've experimented with these options as mentioned, but I think they all serve a slightly different use case. Some colleagues who worked on different projects also didn't have a 100% satisfactory solution for this. Are we missing something; is there a way in which we can configure this to work?

Thanks in advance!

ilir_nuredini · ‎07-09-2025

Hello @LuukDSL ,

Could you share a snippet of your CI/CD YAML file so we can give more specific advice?
I’ve connected Azure DevOps to Databricks using ARM credentials, and it was set the pipeline’s "Run as" user to a service principal, no extra steps were required.

Once you share the snippet, we can suggest the next steps.

Best, Ilir

LuukDSL · ‎07-09-2025

Hi @ilir_nuredini,

Thanks for your response. Does your pipeline also run on a git source of your repository?

In our CD Pipeline.yml for Devops (in which we use Terraform) we have this stage for the asset bundles:
- stage: DeployDatabricksWorkflows
displayName: "Deploy Databricks Workflows with Asset Bundles"
condition: not(failed('ApplyTerraformEnv'))
variables:
workspace_url: $[ stageDependencies.OutputTerraformEnv.outputTerraformJob.outputs['readOutputTerraformTask.workspace_url'] ]
serverless_warehouse_id: $[ stageDependencies.OutputTerraformEnv.outputTerraformJob.outputs['readOutputTerraformTask.serverless_warehouse_id'] ]
jobs:
- job: "DeployAssetBundle"
steps:
- script: "curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh"
displayName: "Install Databricks CLI"
workingDirectory: .

- script: 'databricks bundle deploy --var="serverless_warehouse_id=$(serverless_warehouse_id)"'
displayName: "Deploy Asset Bundle"
workingDirectory: bundle
env:
DATABRICKS_HOST: $(workspace_url)
DATABRICKS_CLIENT_ID: $(client_id)
DATABRICKS_CLIENT_SECRET: $(auth_secret)
DATABRICKS_BUNDLE_ENV: $(target_branch)

In our dataplatform_jobs.yml, workflows and tasks are configured with these settings:

source: GIT
run_as: <email of colleague>
git_source:
git_url: https://our-organisation@dev.azure.com/our-organisation/Dataplatform/_git/dataplatform
git_provider: azureDevOpsServices

ilir_nuredini · ‎07-09-2025

Hello @LuukDSL ,

This is how Iam connecting with Databricks:

env:
ARM_TENANT_ID: $(AZURE_SP_TENANT_ID)
ARM_CLIENT_ID: $(AZURE_SP_APPLICATION_ID)
ARM_CLIENT_SECRET: $(AZURE_SP_CLIENT_SECRET)

So Iam using SP creds to connect, and the Jobs gets assigned as owner and run as SP user.
And Yes it runs on git source of my repository.

Best, Ilir

LuukDSL · ‎07-16-2025

Interesting! Is this also in your CD file where you use databricks bundle deploy? It looks similar to my env part, although you're using AZURE_SP_ variables. I suppose they're also used to make the connection to Databricks?

Also, did you specify somewhere how your SP can make contact with the repo, i.e. via Settings -> Identity and access (workspace admin) -> Service principals (manage), and then via the Git integration tab of your SP?

ilir_nuredini · ‎07-16-2025

Hello @LuukDSL ,

That’s right, I’m using Azure SP variables to connect to Databricks.
However, the part where the SP connects to the repo happens outside of Databricks (e.g. Azure DevOps).
You don’t need to set up any Git integration using the SP, because once you push your code through DABs, it resides within Databricks, no further connection to Git is needed.

Best, Ilir

LuukDSL · ‎07-17-2025

Hi @ilir_nuredini,

[...] because once you push your code through DABs, it resides within Databricks, no further connection to Git is needed.

I think this is what we might be doing differently. At the bottom of my first reply, I specified how the tasks in our workflows are configured via the dataplatform_jobs.yml file. Because we specify the git-source there, it leads to this configuration in the Jobs UI:

So, every workflow run, the code is dynamically pulled. I suppose you're using the DAB in another way, where the whole repo is pushed to a repo in the Databricks Workspace?

ilir_nuredini · ‎07-17-2025

That is right, the whole repo (bundle file structure) is pushed to the Databricks Workspace

saurabh18cs · ‎07-17-2025

can you try generating oauth databricks token for this sp and then pass this token to your databricks bundle deploy as env variables section instead of client id and secret.

Add following to your parameters or your preferred choice of deployment

- name: sp_app_id_dev

displayName: Service Principal App ID DEV (for oauth token)

type: string

default: ""

- name: sp_app_id_acc

displayName: Service Principal App ID ACC (for oauth token)

type: string

default: ""

- name: sp_app_id_prd

displayName: Service Principal App ID PRD (for oauth token)

type: string

default: ""

##############################################################

Add this job as first job:

######################

- job : oauth_bearer_token_sp

steps:

- script: |

wget https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux32 -O $(Build.Repository.LocalPath)/jq

chmod +x $(Build.Repository.LocalPath)/jq

displayName: Install jq

condition: succeeded()

- script: |

if [[ ${{ variables.env}} -eq 'dev' ]]

then

CLIENT_ID=${{ parameters.sp_app_id_dev}}

CLIENT_SECRET=$SP_SECRET_DEV

DATABRICKS_WORKSPACE_URL=${{ parameters.databricks_wrkspc_url_dev}}

elif [[ ${{ variables.env}} -eq 'acc' ]]

then

CLIENT_ID=${{ parameters.sp_app_id_acc}}

CLIENT_SECRET=$SP_SECRET_ACC

DATABRICKS_WORKSPACE_URL=${{ parameters.databricks_wrkspc_url_acc}}

else

CLIENT_ID=${{ parameters.sp_app_id_prd}}

CLIENT_SECRET=$SP_SECRET_PRD

DATABRICKS_WORKSPACE_URL=${{ parameters.databricks_wrkspc_url_prd}}

fi

DATABRICKS_URL="$DATABRICKS_WORKSPACE_URL/api/2.0/token/create"

access_token_val=$(curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \

https://login.microsoftonline.com/af73baa8-f594-4eb2-a39d-93e96cad61fc/oauth2/v2.0/token \

-d "client_id=$CLIENT_ID" \

-d 'grant_type=client_credentials'\

-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \

-d "client_secret=$CLIENT_SECRET")

access_token=$(jq -r '.access_token' <<< "$access_token_val")

echo $access_token

api_response=$(curl -X POST $DATABRICKS_URL \

-H "Authorization: Bearer $access_token" \

-H "X-Databricks-Azure-SP-Management-Token:$access_token" \

-d '{"comment": "pipeline token"}')

echo "$api_response"

DATABRICKS_NEW_TOKEN=$(jq -r '.token_value' <<< "$api_response")

if [ -z "${DATABRICKS_NEW_TOKEN}" ]

then

echo "Token could not be created"

exit 1

else

echo "Successfully created a Databricks Token"

echo "##vso[task.setvariable variable=DATABRICKS_TOKEN;isOutput=true]$DATABRICKS_NEW_TOKEN"

echo "##vso[task.setvariable variable=ACCESS_TOKEN;isOutput=true]$access_token"

fi

displayName: 'Create oauth token'

name: oauth

condition: succeeded()

####################

pass this DATABRICKS_TOKEN to next stage or job as variables

variables:

DATABRICKS_TOKEN: $[ Dependencies.oauth_bearer_token_sp.outputs['oauth.DATABRICKS_TOKEN'] ]

###############

use this DATABRICKS_TOKEN as env for aset bundle deploy script

LuukDSL · ‎07-22-2025

Thanks for your reply. We use a few different jobs, so that would mean that all these jobs would require this task, right? This seems like a large manual approach, which you expect to be able to do automatically. Do you agree with that assessment?

Also, isn't this token that's created still a PAT? Or is this different as you use an Azure App ID for the SP? (I believe we only have one App ID btw.)

saurabh18cs · ‎07-25-2025

Hi,

Even if you have multiple jobs then why your deployment procedure is not one? It should be one common deployment pipeline right? It is more like your way of working but can be standaridized.

I am sharing this from my own experience when we pass client_id and secret then it seems like asset bundles is taking identity of sp as creator only but using deployer identity as run_as.

If you use the approach I suggested then the oauth token which we have generated has the identity of sp both for creator and run_as. I would say give it a try. Thanks

LuukDSL · ‎07-22-2025

Thanks, that explains a lot! I will experiment with that approach and see if it works well for our use cases. Is there a risk that the code is manually changed by a user? Let's say that the acc-branch is pushed via the bundle file structure when a PR from tst has been merged. Now, the whole repo (on the acc-branch) will be pushed. Can a user now change the repo in Databricks (either accidentally or deliberately), or have you found a way to keep this locked?

ilir_nuredini · ‎07-22-2025

Hello @LuukDSL ,

Yes, it's definitely possible that someone accidentally changed something in the bundle folder.

To prevent this, you have a couple of options:

You can restrict access to the folder entirely, or
You can grant VIEW access only, which means users can see the contents but won’t be able to edit the files without going through the standard process.

In either case, if users have access to the jobs, they'll still be able to run them regardless of their permissions on the actual files.

saurabh18cs · ‎07-17-2025

I have provided a solution of your problem , give a try and share feedback. Thanks

saurabh18cs · ‎08-07-2025

Hi @LuukDSL had you try the solution I have provided above ?

Databricks Community

Running jobs as service principal, while pulling code from Azure DevOps

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Portland Data + AI Meetup — Holiday Event - Wednesday, December 3rd