Hi @ajay_wavicle,
There are a couple of well-established patterns for managing Databricks resources with the Terraform provider and CLI without running everything locally. Here is a breakdown of the options and recommended approach.
WHERE TO RUN TERRAFORM AND THE DATABRICKS CLI
The key is to move execution into a CI/CD pipeline so that no one needs to run terraform apply or databricks CLI commands from their laptop. The most common options:
1. GitHub Actions
2. Azure DevOps Pipelines
3. GitLab CI/CD
4. Jenkins
In each case, the pipeline runner installs both the Terraform CLI and the Databricks CLI, authenticates using a service principal, and executes the commands on your behalf.
AUTHENTICATION FOR AUTOMATED PIPELINES
For CI/CD, use OAuth machine-to-machine (M2M) authentication with a Databricks service principal. This avoids personal access tokens and keeps credentials scoped and rotatable.
You will need:
- A Databricks service principal with the appropriate workspace permissions
- The client ID and client secret stored as pipeline secrets (not in code)
- Environment variables set in the pipeline:
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_CLIENT_ID="<your-sp-client-id>"
export DATABRICKS_CLIENT_SECRET="<your-sp-client-secret>"
The Terraform provider picks these up automatically:
provider "databricks" {
# No hardcoded credentials needed; uses env vars
}
Docs: https://docs.databricks.com/aws/en/dev-tools/auth/index.html
TERRAFORM PROVIDER SETUP
1. Store your .tf files in a Git repository.
2. Use a remote backend for Terraform state (S3 + DynamoDB for AWS, Azure Blob Storage for Azure, or Terraform Cloud/HCP). This way, state is shared across the team and pipeline runs, not on anyone's local machine.
3. Structure your repo with modules for reusable components.
Example project layout:
my-databricks-infra/
main.tf
variables.tf
outputs.tf
modules/
workspace-config/
main.tf
unity-catalog/
main.tf
environments/
dev.tfvars
staging.tfvars
prod.tfvars
A minimal main.tf:
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "databricks/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-lock"
}
}
provider "databricks" {}
Databricks maintains an examples repository with CI/CD patterns for GitHub Actions and Azure DevOps:
https://github.com/databricks/terraform-databricks-examples
Look at the "manual-approve-with-github-actions" and "manual-approve-with-azure-devops" folders for ready-to-use pipeline templates.
Provider registry docs: https://registry.terraform.io/providers/databricks/databricks/latest/docs
DATABRICKS CLI IN CI/CD
Install the CLI in your pipeline with:
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
Then use it for file operations, workspace sync, or running bundle commands. The same environment variables (DATABRICKS_HOST, DATABRICKS_CLIENT_ID, DATABRICKS_CLIENT_SECRET) authenticate the CLI automatically.
For file export/import specifically:
# Export a notebook
databricks workspace export /Users/someone/notebook.py ./local-copy.py
# Import a notebook
databricks workspace import ./local-copy.py /Users/someone/notebook.py
# Sync a local directory to the workspace
databricks sync ./src /Workspace/Users/someone/project --watch=false
RECOMMENDED APPROACH: DATABRICKS ASSET BUNDLES
If your goal is to manage and migrate Databricks resources (jobs, notebooks, pipelines) across environments, consider Databricks Asset Bundles (DABs). They combine the best of both worlds: you define resources as YAML configuration files in Git, and the Databricks CLI handles deployment.
DABs support:
- Multi-environment promotion (dev, staging, prod) through "targets"
- Service principal authentication for CI/CD
- Automatic resource naming and isolation in dev mode
- GitHub Actions integration for automated deployments
A quick example of a databricks.yml:
bundle:
name: my_project
targets:
dev:
mode: development
default: true
workspace:
host: https://dev-workspace.cloud.databricks.com
prod:
mode: production
workspace:
host: https://prod-workspace.cloud.databricks.com
run_as:
service_principal_name: prod-deployer@company.com
Deploy with:
databricks bundle deploy --target prod
Docs: https://docs.databricks.com/aws/en/dev-tools/bundles/index.html
Deployment modes: https://docs.databricks.com/aws/en/dev-tools/bundles/deployment-modes.html
WHEN TO USE TERRAFORM VS. ASSET BUNDLES
- Use the Terraform provider for workspace-level infrastructure: creating workspaces, configuring Unity Catalog, managing IAM roles, setting up networking, and provisioning cloud resources.
- Use Databricks Asset Bundles for application-level resources: deploying jobs, notebooks, pipelines, and ML experiments across environments.
- Many teams use both: Terraform for the "platform layer" and DABs for the "application layer."
SAMPLE GITHUB ACTIONS WORKFLOW
Here is a simplified GitHub Actions workflow that runs Terraform without any local execution:
name: Deploy Databricks Infra
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.0
- run: terraform init
- run: terraform plan -var-file=environments/prod.tfvars
- run: terraform apply -auto-approve -var-file=environments/prod.tfvars
This keeps everything in version control and running in the cloud, with no local execution required.
Hope this helps you get set up. Let me know if you have follow-up questions about any of these patterns.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.