Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

Answers to your questions.

  1. Orchestration (replace ADF)

    • Use Lakeflow Jobs (Databricks Jobs) as the primary orchestrator: one job per use case with a task graph (notebook / SQL / pipeline tasks) to express both sequential and parallel branches, retries, timeouts, and alerts.
    • For ELT-heavy flows, define Lakeflow Spark Declarative Pipelines for the data pipeline itself, and call them from Lakeflow Jobs for control-flow and scheduling.
  2. Project structure (20 use cases + env-specific params)

    • Recommended: one repo with modular bundle configs per use case, e.g. resources/jobs/usecase_X.yml and (optionally) resources/pipelines/usecase_X.yml, plus shared cluster definitions and variables.
    • Use Declarative Automation Bundles targets + variables for environment-specific values (dev/prod catalogs, Key Vault/secret scopes, workspace URLs) instead of duplicating YAML: override only what changes per target (dev, prod).
  3. Shared library code (Python utils)

    • Best practice is to package the shared utils as a wheel (built in CI), store it in an artifact feed or UC volume, and reference it as a job/pipeline library dependency; notebook imports stay the same, and all use cases share the same versioned package.
    • Repo-syncing the whole codebase and importing via workspace-relative paths works but scales worse; prefer wheels for anything you run in prod or across multiple workspaces.
  4. Cross-workspace promotion (dev → prod)

    • Use an Azure AD Service Principal with OAuth workload identity federation for the Databricks CLI / Bundles; this is the recommended, most secure CI/CD auth pattern and avoids long-lived PATs.
    • Treat the SP as a first-class principal in Unity Catalog: grant it workspace access plus the required USE CATALOG, USE SCHEMA, and table privileges in each environment; many teams use separate targets and (optionally) separate SPs for dev vs prod.
  5. CI/CD (Azure DevOps, Terraform, Databricks-native)

    • Most teams keep Azure DevOps (or GitHub Actions/Jenkins) as their CI/CD engine and introduce Declarative Automation Bundles for Databricks-side IaC (jobs, pipelines, clusters, permissions); ADO just runs databricks bundle validate/deploy --target=dev|prod steps.
    • Terraform remains useful for workspace-level infra (workspaces, networks, storage, UC metastore), while Bundles manage workloads and configs (jobs/pipelines/dashboards/etc.) in the same repo as your notebooks and Python code.
  6. Incremental migration (ADF → Lakeflow Jobs)

    • Yes: you can migrate one use case at a time by creating the equivalent Lakeflow Job/Pipeline, validating it in dev, then switching only that use case’s schedule from ADF to Databricks; the rest continue to run in ADF without interference.
    • Just ensure each pipeline has one active scheduler (disable or pause the corresponding ADF pipeline once the Databricks job is live) and keep all storage/UC references identical so data remains consistent.

Relevant Azure Databricks docs: