Re: Transitioning from ADF to Databricks Workflows...

Lu_Wang_ENB_DBX · ‎04-28-2026

Answers to your questions.

Orchestration (replace ADF)
- Use Lakeflow Jobs (Databricks Jobs) as the primary orchestrator: one job per use case with a task graph (notebook / SQL / pipeline tasks) to express both sequential and parallel branches, retries, timeouts, and alerts.
- For ELT-heavy flows, define Lakeflow Spark Declarative Pipelines for the data pipeline itself, and call them from Lakeflow Jobs for control-flow and scheduling.
Project structure (20 use cases + env-specific params)
- Recommended: one repo with modular bundle configs per use case, e.g. resources/jobs/usecase_X.yml and (optionally) resources/pipelines/usecase_X.yml, plus shared cluster definitions and variables.
- Use Declarative Automation Bundles targets + variables for environment-specific values (dev/prod catalogs, Key Vault/secret scopes, workspace URLs) instead of duplicating YAML: override only what changes per target (dev, prod).
Shared library code (Python utils)
- Best practice is to package the shared utils as a wheel (built in CI), store it in an artifact feed or UC volume, and reference it as a job/pipeline library dependency; notebook imports stay the same, and all use cases share the same versioned package.
- Repo-syncing the whole codebase and importing via workspace-relative paths works but scales worse; prefer wheels for anything you run in prod or across multiple workspaces.
Cross-workspace promotion (dev → prod)
- Use an Azure AD Service Principal with OAuth workload identity federation for the Databricks CLI / Bundles; this is the recommended, most secure CI/CD auth pattern and avoids long-lived PATs.
- Treat the SP as a first-class principal in Unity Catalog: grant it workspace access plus the required USE CATALOG, USE SCHEMA, and table privileges in each environment; many teams use separate targets and (optionally) separate SPs for dev vs prod.
CI/CD (Azure DevOps, Terraform, Databricks-native)
- Most teams keep Azure DevOps (or GitHub Actions/Jenkins) as their CI/CD engine and introduce Declarative Automation Bundles for Databricks-side IaC (jobs, pipelines, clusters, permissions); ADO just runs databricks bundle validate/deploy --target=dev|prod steps.
- Terraform remains useful for workspace-level infra (workspaces, networks, storage, UC metastore), while Bundles manage workloads and configs (jobs/pipelines/dashboards/etc.) in the same repo as your notebooks and Python code.
Incremental migration (ADF → Lakeflow Jobs)
- Yes: you can migrate one use case at a time by creating the equivalent Lakeflow Job/Pipeline, validating it in dev, then switching only that use case’s schedule from ADF to Databricks; the rest continue to run in ADF without interference.
- Just ensure each pipeline has one active scheduler (disable or pause the corresponding ADF pipeline once the Databricks job is live) and keep all storage/UC references identical so data remains consistent.

Relevant Azure Databricks docs:

CI/CD on Azure Databricks: high-level patterns + tool choices
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/
What are Declarative Automation Bundles? (core to organizing jobs/pipelines + envs)
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/
Tutorial – Develop a job with Declarative Automation Bundles:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/jobs-tutorial
Tutorial – Develop pipelines with Declarative Automation Bundles:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/pipelines-tutorial
CI/CD with Azure DevOps on Azure Databricks:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/azure-devops