cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Service Principal with Federated Credentials Can’t Access Full Repo in ADO

PearceR
New Contributor III

Good Afternoon,

I’m using Databricks with Git integration to Azure DevOps (ADO).

  • Authentication is via Microsoft Entra federated credentials for a service principal (SPN).
  • The SPN has Basic access in ADO, is in the same project groups as my user, and Git integration is configured successfully.
  • When I run jobs as my user, the entire repo is accessible.
  • When I run jobs as the SPN, only the entry file specified in the job runs; other files in the repo are not accessible.
  • The job source is set to a branch, not a single file.

Why does the SPN only get access to the entry file and not the full repo? Is this expected behavior for Git source jobs, or am I missing a configuration step to allow full repo checkout when using a service principal? Has anyone else experienced similar issues?

I followed steps outlined here: Use a Microsoft Entra service principal for automation with Azure Databricks Git folders - Azure Dat...

from this release: OAuth 2.0 Git credential support for Service Principals is now Generally Available | Databricks Blog

Thanks for reading!

1 REPLY 1

AbhaySingh
Databricks Employee
Databricks Employee

The issue stems from a fundamental architectural difference in how Databricks handles Git authentication:

 

  1. Git Credential Gap: While your SPN successfully authenticates to Databricks via Microsoft Entra federated

  credentials, it lacks the secondary Git credentials needed to clone the full repository from Azure DevOps at job

  runtime.

  2. Runtime Behavior: When using "Git source" for jobs, Databricks attempts to clone the repository fresh for each

  run. SPNs don't have personal Git credentials stored like regular users do, so only the specified entry file is

  accessible.

 

  This is NOT a Configuration Error

 

  Your setup appears correct:

  - SPN has Basic access in ADO

  - SPN is in the same project groups

  - Git integration shows as configured successfully

  - Job source is set to a branch

 

  The limitation is that Microsoft Entra federated credentials authenticate the SPN to Databricks but don't

  automatically provide Git repository access credentials for Azure DevOps.

 

  Recommended Solutions (In Order of Preference)

 

  1. Use Workspace Files Instead of Git Source (Most Reliable)

 

  Switch your job configuration from "Git source" to "Workspace files":

  - Deploy code to Databricks workspace via CI/CD pipeline

  - No Git authentication needed at runtime

  - Full repository access guaranteed

 

  2. Manual Repository Clone in Entry Script (Quick Fix)

 

  Add this to your entry file:

  import subprocess

  import sys

  import os

 

  # Store PAT in Databricks secrets

  pat_token = dbutils.secrets.get(scope="ado-secrets", key="pat-token")

  repo_url = "https://dev.azure.com/your-org/project/_git/repo"

 

  # Clone with authentication

  repo_url_with_auth = repo_url.replace("https://", f"https://{pat_token}@")

  subprocess.run(["git", "clone", "--depth", "1", "--branch", "main",

                  repo_url_with_auth, "/tmp/repo"])

 

  # Add to Python path

  sys.path.insert(0, '/tmp/repo')

 

  # Now import your modules

  from your_module import your_function

 

  3. Use a Dedicated Service User (If SPNs Must Be Avoided)

 

  Create a dedicated Databricks user account (not SPN) specifically for automated jobs with proper Git credentials

  configured.

 

  Action Items

 

  1. Immediate: Implement Solution #2 (manual clone) to unblock your work

  2. Short-term: Test Solution #1 (workspace files) in a dev environment

  3. Long-term: Monitor Databricks releases for enhanced SPN Git credential support

 

  Verification Script

 

  Run this in your job to confirm the behavior:

  import os

  import subprocess

 

  print("=" * 50)

  print("DEBUGGING GIT ACCESS")

  print("=" * 50)

  print(f"Current directory: {os.getcwd()}")

  print(f"Directory contents: {os.listdir('.')}")

  print(f"Running as: {spark.conf.get('spark.databricks.clusterUsageTags.userName')}")

 

  # Check Git status

  result = subprocess.run(['git', 'status'], capture_output=True, text=True)

  if result.returncode == 0:

      print("Git repository detected")

      print(f"Git status: {result.stdout}")

  else:

      print("Not in a Git repository")

      print(f"Error: {result.stderr}")

 

  This issue is a known limitation in the Databricks platform rather than a misconfiguration on your part. The

  workarounds above are standard practice for organizations using SPNs with Databricks Git-source jobs.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now