The issue stems from a fundamental architectural difference in how Databricks handles Git authentication:
1. Git Credential Gap: While your SPN successfully authenticates to Databricks via Microsoft Entra federated
credentials, it lacks the secondary Git credentials needed to clone the full repository from Azure DevOps at job
runtime.
2. Runtime Behavior: When using "Git source" for jobs, Databricks attempts to clone the repository fresh for each
run. SPNs don't have personal Git credentials stored like regular users do, so only the specified entry file is
accessible.
This is NOT a Configuration Error
Your setup appears correct:
- ✅ SPN has Basic access in ADO
- ✅ SPN is in the same project groups
- ✅ Git integration shows as configured successfully
- ✅ Job source is set to a branch
The limitation is that Microsoft Entra federated credentials authenticate the SPN to Databricks but don't
automatically provide Git repository access credentials for Azure DevOps.
Recommended Solutions (In Order of Preference)
1. Use Workspace Files Instead of Git Source (Most Reliable)
Switch your job configuration from "Git source" to "Workspace files":
- Deploy code to Databricks workspace via CI/CD pipeline
- No Git authentication needed at runtime
- Full repository access guaranteed
2. Manual Repository Clone in Entry Script (Quick Fix)
Add this to your entry file:
import subprocess
import sys
import os
# Store PAT in Databricks secrets
pat_token = dbutils.secrets.get(scope="ado-secrets", key="pat-token")
repo_url = "https://dev.azure.com/your-org/project/_git/repo"
# Clone with authentication
repo_url_with_auth = repo_url.replace("https://", f"https://{pat_token}@")
subprocess.run(["git", "clone", "--depth", "1", "--branch", "main",
repo_url_with_auth, "/tmp/repo"])
# Add to Python path
sys.path.insert(0, '/tmp/repo')
# Now import your modules
from your_module import your_function
3. Use a Dedicated Service User (If SPNs Must Be Avoided)
Create a dedicated Databricks user account (not SPN) specifically for automated jobs with proper Git credentials
configured.
Action Items
1. Immediate: Implement Solution #2 (manual clone) to unblock your work
2. Short-term: Test Solution #1 (workspace files) in a dev environment
3. Long-term: Monitor Databricks releases for enhanced SPN Git credential support
Verification Script
Run this in your job to confirm the behavior:
import os
import subprocess
print("=" * 50)
print("DEBUGGING GIT ACCESS")
print("=" * 50)
print(f"Current directory: {os.getcwd()}")
print(f"Directory contents: {os.listdir('.')}")
print(f"Running as: {spark.conf.get('spark.databricks.clusterUsageTags.userName')}")
# Check Git status
result = subprocess.run(['git', 'status'], capture_output=True, text=True)
if result.returncode == 0:
print("Git repository detected")
print(f"Git status: {result.stdout}")
else:
print("Not in a Git repository")
print(f"Error: {result.stderr}")
This issue is a known limitation in the Databricks platform rather than a misconfiguration on your part. The
workarounds above are standard practice for organizations using SPNs with Databricks Git-source jobs.