Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-28-2026 08:48 AM
Answers to your questions.
-
Orchestration (replace ADF)
- Use Lakeflow Jobs (Databricks Jobs) as the primary orchestrator: one job per use case with a task graph (notebook / SQL / pipeline tasks) to express both sequential and parallel branches, retries, timeouts, and alerts.
- For ELT-heavy flows, define Lakeflow Spark Declarative Pipelines for the data pipeline itself, and call them from Lakeflow Jobs for control-flow and scheduling.
-
Project structure (20 use cases + env-specific params)
- Recommended: one repo with modular bundle configs per use case, e.g.
resources/jobs/usecase_X.ymland (optionally)resources/pipelines/usecase_X.yml, plus shared cluster definitions and variables. - Use Declarative Automation Bundles targets + variables for environment-specific values (dev/prod catalogs, Key Vault/secret scopes, workspace URLs) instead of duplicating YAML: override only what changes per target (
dev,prod).
- Recommended: one repo with modular bundle configs per use case, e.g.
-
Shared library code (Python utils)
- Best practice is to package the shared utils as a wheel (built in CI), store it in an artifact feed or UC volume, and reference it as a job/pipeline library dependency; notebook imports stay the same, and all use cases share the same versioned package.
- Repo-syncing the whole codebase and importing via workspace-relative paths works but scales worse; prefer wheels for anything you run in prod or across multiple workspaces.
-
Cross-workspace promotion (dev → prod)
- Use an Azure AD Service Principal with OAuth workload identity federation for the Databricks CLI / Bundles; this is the recommended, most secure CI/CD auth pattern and avoids long-lived PATs.
- Treat the SP as a first-class principal in Unity Catalog: grant it workspace access plus the required
USE CATALOG,USE SCHEMA, and table privileges in each environment; many teams use separate targets and (optionally) separate SPs for dev vs prod.
-
CI/CD (Azure DevOps, Terraform, Databricks-native)
- Most teams keep Azure DevOps (or GitHub Actions/Jenkins) as their CI/CD engine and introduce Declarative Automation Bundles for Databricks-side IaC (jobs, pipelines, clusters, permissions); ADO just runs
databricks bundle validate/deploy --target=dev|prodsteps. - Terraform remains useful for workspace-level infra (workspaces, networks, storage, UC metastore), while Bundles manage workloads and configs (jobs/pipelines/dashboards/etc.) in the same repo as your notebooks and Python code.
- Most teams keep Azure DevOps (or GitHub Actions/Jenkins) as their CI/CD engine and introduce Declarative Automation Bundles for Databricks-side IaC (jobs, pipelines, clusters, permissions); ADO just runs
-
Incremental migration (ADF → Lakeflow Jobs)
- Yes: you can migrate one use case at a time by creating the equivalent Lakeflow Job/Pipeline, validating it in dev, then switching only that use case’s schedule from ADF to Databricks; the rest continue to run in ADF without interference.
- Just ensure each pipeline has one active scheduler (disable or pause the corresponding ADF pipeline once the Databricks job is live) and keep all storage/UC references identical so data remains consistent.
Relevant Azure Databricks docs:
- CI/CD on Azure Databricks: high-level patterns + tool choices
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ - What are Declarative Automation Bundles? (core to organizing jobs/pipelines + envs)
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/ - Tutorial – Develop a job with Declarative Automation Bundles:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/jobs-tutorial - Tutorial – Develop pipelines with Declarative Automation Bundles:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/pipelines-tutorial - CI/CD with Azure DevOps on Azure Databricks:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/azure-devops