cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Transitioning from ADF to Databricks Workflows: Best Practices in a Multi-Workspace (dev-prod)

Darshan137
New Contributor II

Hi Community,

We have a data processing framework running on Azure Databricks with Unity Catalog, and we're evaluating options to consolidate our orchestration entirely within the Databricks ecosystem.

CURRENT ARCHITECTURE:

  • ~20 use cases, each containing 3-6 Python notebooks organized by business domain
  • A shared Python utility package (with init.py) used across all use cases
  • Two Databricks workspaces: Development and Production
  • Unity Catalog for data governance and storage
  • Azure Data Factory for orchestrating notebook execution (task ordering, dependencies)
  • Azure DevOps CI/CD pipelines (one per use case) deploying notebooks to workspaces via Terraform templates
  • Environment-specific configs (Key Vault names, service connections, catalog references) managed through ADO variable groups and YAML templates

WHAT WE WANT TO ACHIEVE:

  • Replace ADF orchestration with native Databricks orchestration (Lakeflow Jobs / Pipelines)
  • Manage environment-specific parameters (dev/prod catalog names, Key Vault, etc.) cleanly across workspaces
  • Keep our shared Python utility package working across all use cases without duplication
  • Zero changes to existing notebook code

QUESTIONS:

  1. Orchestration: What is the recommended Databricks-native approach to replace ADF for orchestrating notebook execution with task dependencies? We need both sequential and parallel task support.

  2. Project structure: With ~20 use cases, what is the recommended way to organize job/pipeline definitions? One monolithic config vs. modular per-use-case definitions?

  3. Shared library code: Our notebooks import from a shared Python package. What is the best way to handle this - sync the entire repo, or package it as a wheel?

  4. Cross-workspace promotion: For promoting from dev to prod workspace, what authentication method works best - Service Principal with OAuth (M2M) or PAT tokens? Any Unity Catalog permission considerations?

  5. CI/CD: We currently use Azure DevOps plus Terraform for deploying notebook code and job definitions to both workspaces. For those who have made a similar migration - does it make sense to replace Azure DevOps with a Databricks-native deployment approach, or do most teams keep an external CI/CD tool alongside Databricks orchestration?

  6. Incremental migration: Can we migrate one use case at a time while others still run via the legacy ADF setup, without conflicts?

Any real-world experience, recommended approaches, or reference architectures would be very helpful. Is there any tutorial available for it then please provide the link also.

Thanks!

0 REPLIES 0