Databricks Community

DineshOjha · 3 weeks ago

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?

Hi everyone,

We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a workspace‑permissions challenge involving Service Principals.

Our setup

Developers create notebooks under their personal workspace paths:
```
/Workspace/Users/<user_email>/project/notebook
```
These notebooks are synced to Git using Databricks Git folders.
We want to create Databricks Bundles jobs that reference these notebooks and
run them using a Service Principal (SP) for production automation.

The problem

A Service Principal cannot access user workspace paths such as:

/Workspace/Users/<user_email>/...

We also cannot:

Move the notebook into /Workspace/Shared/...
Grant the SP access to individual user workspace directories

So the SP has no way to read or execute the notebook, and therefore cannot run the job.

Our question

How should we structure our workspace, Git folders, or permissions so the Service Principal can run Bundle‑based jobs, without granting SP access to personal user directories?

Ashwin_DSA · 3 weeks ago

Hi @DineshOjha,

Given your constraints (per‑application service principals, isolation at the volume/schema level, and not wanting to use /Workspace/Shared), the flow you described aligns with how Bundles are meant to be used in production. Bundles are the recommended CI/CD mechanism, and using service principals as run identities in non‑dev targets is explicitly encouraged.

A couple of clarifications and direct answers to your questions:

1. Do you think this is a good approach for notebook based implementation or do you suggest anything else?

Yes, this is a solid pattern for notebook‑based implementations... Git serves as the source of truth, with personal workspaces intended solely for development purposes. Bundles are responsible for deploying notebooks and job definitions into the workspace. In non-development targets, the run_as parameter is configured to use the per-application service principal, ensuring that all production runs utilise that principal’s permissions. This setup includes access to the appropriate volume/schema, which is critical for maintaining consistency and security throughout the deployment process.

The only design choice you still need is where in the workspace Bundles deploy to. You don’t have to use /Workspace/Shared. You can pick any isolated path, for example /Workspace/.bundle/prod/${bundle.name} or /Workspace/Projects/<app_name>/... and lock that path down so only the application service principal, a small operator group, and optionally CI/CD deployer principals have access. The path naming is up to you. Bundles just need a root_path per target, and you control the permissions there.

So I would keep your 4‑step approach and add a per‑app workspace root (instead of /Shared), with ACLs granting access only to the relevant SP and operators.

2. The service principal exists only in Databricks, so what email and PAT should be provided to enable GIT access?

With the Bundles‑from‑Azure‑DevOps pattern you described, the important nuance is that your Databricks service principal does not need to talk directly to Git to make this work. In a typical Azure DevOps setup.. Azure DevOps pipelines clone the Git repo themselves using the identity configured in DevOps (service connection, PAT, or Microsoft Entra–backed principal). Once the code is on the build agent, the pipeline calls databricks bundle validate/deploy/run using the Databricks service principal to authenticate to Databricks, not to Git.

In that model, you do not need to configure a Git email/PAT on the Databricks SP at all. Git credentials live entirely in Azure DevOps (for checking out the repo).The Databricks SP is only used for workspace authentication (via OAuth M2M, workload identity federation, or an ARM service connection). You only need Git credentials on the Databricks SP if you also want it to use Git folders / Repos in the workspace, or run Git‑backed jobs directly from Databricks (using Git‑with‑jobs / Git folders).

In that case, the email/PAT would belong to a non‑human Azure DevOps identity (service principal or technical user) that has access to the repo. You then link those Git credentials to the Databricks SP via the Git integration tab in the workspace.

3. How will service principal get access to the Azure GIT repo (ADO repository)?

In a two-layer setup, the first layer involves Azure DevOps and a Git repository. In this configuration, you create a service principal or technical user with at least Basic access and repository permissions in Azure DevOps. This identity is utilised for your pipelines to check out the code, and it is managed within Azure DevOps, not in Databricks.

The second layer connects Azure DevOps to Databricks through a Databricks service principal. To set this up, you configure an Azure DevOps service connection that authenticates to Databricks using methods such as OAuth M2M, Azure Resource Manager connection, or the recommended workload identity federation (which avoids long-lived secrets). Your pipeline steps will involve commands like `databricks bundle validate -t prod`, `databricks bundle deploy -t prod`, and `databricks bundle run -t prod <job_name>`, with the Databricks CLI already authenticated as the service principal.

For a Bundles-only flow, the Databricks service principal does not require direct Git access. It simply needs to support CLI/API calls from the pipeline. However, if you want the Databricks service principal to operate on Git folders within the workspace, you must grant the DevOps identity access to the repository (Basic + repo permissions) and link its Git credentials to your Databricks service principal under the settings for Git integration with Azure DevOps (using PAT or Entra-based authentication).

4. Is there any other access that the service principal needs for this approach, for bundles etc ?

For the exact permission model and how to wire this up, the official docs cover it in more detail..

CI/CD on Azure Databricks (incl. Bundles + service principals)
Service principals for CI/CD (what entitlements/permissions they need)
Authorize a Microsoft Entra service principal to access Git folders (if you decide the SP should access Git folders directly)
Authenticate with Azure DevOps on Azure Databricks (how the DevOps pipeline authenticates as the SP)
Bundles run_as configuration (how to set the SP as the run identity in targets)

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

Ashwin_DSA · 3 weeks ago

Hi @DineshOjha,

This is a good question, and researching this helped me learn some best practices along the way. What you’re seeing is actually expected behaviour. Service principals aren’t meant to execute notebooks directly from users’ personal workspace paths. That limitation is by design for security and isolation reasons.

Given you’re using Databricks Bundles and a notebooks‑first workflow, the recommended pattern is to treat Git as the source of truth. Developers can work on notebooks under their own /Workspace/Users/... paths (or locally) for convenience, then sync them to Git (via Git folders / Repos). Those copies in personal home directories should be considered development artefacts only, not what production jobs execute. In production, jobs should use notebooks deployed from Git into a shared workspace path, or reference Git directly (using jobs with a Git-based notebook source).

Instead of pointing jobs to /Workspace/Users/..., configure your bundle target so that it deploys notebooks into a shared folder where both the service principal has at least read/execute access, and your team can still inspect the deployed artefacts.

For example, in your bundle:

targets:
  prod:
    workspace:
      host: https://&lt;your-workspace-url&gt;
      root_path: /Workspace/Shared/projects/my-project

When you run databricks bundle deploy (ideally from CI/CD, authenticated as the service principal), the notebooks defined in the bundle are materialised under /Workspace/Shared/projects/my-project/... Your bundle’s jobs should reference those deployed notebook paths, not the originals under /Workspace/Users/....

On the Databricks side, you’ll typically want

On /Workspace/Shared/projects/my-project - Can Read (or higher) for the service principal, so it can read/execute the notebooks.
On the jobs created by the bundle - Can Manage or Can Run for the service principal, depending on your governance model.
On compute - Permission for the service principal to use the job cluster or shared compute resource configured for the bundle’s jobs.

With this setup developers continue to use their personal workspace areas for development. Git remains the source of truth. And, the service principal only interacts with the shared, deployed artifacts and never needs access to /Workspace/Users/....

If you prefer to be fully Git‑centric, you can also configure jobs to pull notebooks directly from Git (e.g. via Repos/git_source) and grant the service principal access to the Git repo, plus job permissions as above. However, the core principle is the same in both approaches... Don’t run production jobs against notebooks in /Workspace/Users/.... Use Git as the source of truth, and deploy or reference notebooks from a shared, service‑principal‑readable location.

Hope that helps clarify the pattern.

Please let me know if any of the above is unclear.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

DineshOjha · 3 weeks ago

Thank you so much for your response.

We dont prefer to keep the notebooks under Shared or run our jobs pointing to the Shared location. We have more than 200 applications and different teams working on them. Each application has a service principal associated with it and only the service principal has access to the specific applications volume and schema.

Based on your response, we are planning to follow the below approach.

1. Create notebooks under personal user account

2. Push the code to GIT

3. Deploy using bundles

4. In the bundles, provide the run_as AS service principal so that the jobs are owned and run using the service principal.

Questions:

1. Do you think this is a good approach for notebook based implementation or do you suggest anything else?

2. The service principal exists only in Databricks, so what email and PAT should be provided to enable GIT access? 3. How will service principal get access to the Azure GIT repo (ADO repository)?

4. Is there any other access that the service principal needs for this approach, for bundles etc ?

Ashwin_DSA · 3 weeks ago