Hey mbanxp!
The most scalable and maintainable orchestration pattern for multi-tenant medallion architectures in Databricks is to build independent pipelines per table for all clients, with each pipeline parameterized by client/tenant.
Why this approach?
- Centralizes business logic for each table (reduces code duplication).
- Makes onboarding new clients easyโjust add configuration, don't duplicate pipeline code.
- Scales well as data and client count grow.
- Fits perfectly with Databricks Workflows and Delta Live Tables (DLT), which support parameterized, multi-tenant pipelines and robust orchestration.
- Unity Catalog provides strong data isolation and governance at the client level, even when sharing pipelines.
Platform Features Enabling This Pattern:
- Databricks Workflows: Orchestrate parameterized, multi-tenant pipelines.
- Delta Live Tables (DLT): Declaratively define ETL flows partitioned by client.
- Unity Catalog: Fine-grained access control and catalog/schema separation per client.
Extra tips:
Leverage partitioning and schema separation by client within each layer, and use centralized pipelines to tune job frequencies and resource usage.
Summary:
Organizing by per-table, multi-tenant pipelines is Databricksโ best practice for efficient, standardized, and easily-governed medallion data flows at scale.
I hope this helps.
Best,
Sarah