Most suitable Data Promotion orchestration for multi-tenant data lake in Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2025 07:36 AM
Hi there !!! I would like to find the most suitable orchestration process to promote data between medallion layers I need to solve the following key architectural decision for scaling my multi-tenant data lake in Databricks.
My setup:
- Independent medallion architecture per client (Landing → Bronze → Silver → Gold per client)
- Identical schema across all clients (same data model)
- Multiple tables per layer (each with specific transformations)
What would be the best approach in Databricks to orchestrate the data promotion between layers ?
- Independent pipelines per client for all tables
- Independent pipelines per client and table
- Independent pipelines per table for all clients
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2025 06:09 AM
Hey mbanxp!
The most scalable and maintainable orchestration pattern for multi-tenant medallion architectures in Databricks is to build independent pipelines per table for all clients, with each pipeline parameterized by client/tenant.
Why this approach?
- Centralizes business logic for each table (reduces code duplication).
- Makes onboarding new clients easy—just add configuration, don't duplicate pipeline code.
- Scales well as data and client count grow.
- Fits perfectly with Databricks Workflows and Delta Live Tables (DLT), which support parameterized, multi-tenant pipelines and robust orchestration.
- Unity Catalog provides strong data isolation and governance at the client level, even when sharing pipelines.
Platform Features Enabling This Pattern:
- Databricks Workflows: Orchestrate parameterized, multi-tenant pipelines.
- Delta Live Tables (DLT): Declaratively define ETL flows partitioned by client.
- Unity Catalog: Fine-grained access control and catalog/schema separation per client.
Extra tips:
Leverage partitioning and schema separation by client within each layer, and use centralized pipelines to tune job frequencies and resource usage.
Summary:
Organizing by per-table, multi-tenant pipelines is Databricks’ best practice for efficient, standardized, and easily-governed medallion data flows at scale.
I hope this helps.
Best,
Sarah
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2025 02:25 AM
Hi sarahbhord !!!
Thanks very much for the useful reply, it really helps understanding the best approach to follow. In my case I have roughly the following architecture:
Based on the the approach of independent pipelines per table for all clients, what would be your recommendation ?