cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Building MultiTenant Architecture on Databricks Platform

rathorer
New Contributor II

This use case demonstrates how a SaaS product can be deployed for multiple customers or business units, ensuring data isolation at every layer through workspace separation, fine-grained access control with Unity Catalog, and secure processing using UDF-based row-level security.

It would consist of the below steps in order to build multi-Tenant:

rathorer_0-1753157201307.png

Design Approaches: I will talk about 2 different design approaches for this.

rathorer_1-1753157291951.png

Architecture Components

1. Cloud Storage Setup (S3/ADLS):

Organize buckets/folders by tenant for strict isolation. This point is required for Storage Isolation and needs to follow below steps:

  • Single Bucket for each Tenant (Customer)
  • Folder wise segregation at Env Level.
  • Attach External Locations to catalogs, map them to storage paths.
  • Use IAM roles per tenant workspace (one per tenant) for secure access via Unity Catalog’s External Locations + Storage Credentials.

s3://my-saas-lakehouse/

├── tenant_a/

│   ├── raw/

│   ├── bronze/

│   ├── silver/

│   └── gold/

├── tenant_b/

│   ├── raw/

│   ├── bronze/

│   ├── silver/

│   └── gold/

└── shared/

    ├── reference_data/

    └── system_logs/

rathorer_2-1753157577943.png

2. Unity Catalog Structure

Create one catalog per tenant:
Catalogs:
- tenant_a_catalog
- tenant_b_catalog
- shared_catalog
Each catalog contains:
Schemas:
- raw
- bronze
- silver
- gold
Attach External Locations to catalogs, map them to storage paths.

3. Workspace Binding

Each customer/tenant has a dedicated Databricks workspace.

Use the Unity Catalog workspace binding feature:

  • Bind only tenant_a_catalog to Tenant A Workspace

  • Bind only tenant_b_catalog to Tenant B Workspace

This ensures:

  • Workspace cannot access any unbound catalogs

  • Security is enforced by design

Example: 
databricks unity-catalog bind-catalogs \
--workspace-id=12345 \
--catalogs tenant_a_catalog,shared_catalog

UC Isolation with Workspace Binding:

rathorer_3-1753157853253.png

Environment Isolation with Access pattern per Tenant:

rathorer_4-1753157916598.png

SDLC Setup with UC per Tenant

rathorer_5-1753157967028.png

 

Approach  

  • For Every Tenant, all related SDLC workspaces (DEV, STG, PRD, …)

oIsolate the environments on the Catalog level of the 3-level namespace of Unity Catalog. Assign DEV, STG, PRD workspaces to their respective catalog only.

oIsolate the DEV, STG, PRD data locations by assigning dedicated managed buckets/containers to the catalogs

oIsolate admin scope by delegating administration of the catalogs to different admins for DEV, STG, PRD

  • Catalog names can be combinations of SDLC and business / organizational unit names, e.g. sales_dev, sales_prd, engineering_dev
  • Access to workspaces, clusters and endpoints needs to be configured accordingly

Central Analytics:

This approach is used as if Centralized Analytics is required for all the Tenants. This would have separate Centralized WS+ Catalog and ETL process would flow the data to this catalog. Data agreement should be done with all tenants before setting up this process and only agreed data sets should be flown. Proper Security framework should be setup for the data processing and data access for the ETL and consumption from Centralized catalog. It performs below tasks:

Data Production

  • Central Ingest & ETL by the central BU
  • Other BUs create (business) data sets

Data Publishing

  • Central BU publishes data in central PRD storage and into the PRD catalog of the Central BU in UC
  • BUs requests from Central team to publish from their PRD storage to central PRD storage and into the BU catalog in UC that is maintained by Central

Data Governance (centralized)

  • Central team and each BU (for non published) data can work independently on their catalogs
  • Central team applies additional quality assurance and maintains ACLs in the central BU catalog

Data Consumption

  • Published data will be discovered in the Central catalog and consumed from the central PRD storage

Platform operations

  • Central team provides platform blueprints, creates environments for BUs (automated)
  • Central team could provide common data services

rathorer_6-1753158483417.png

Here BU1 and BU2 represents the Tenants and there will be no access across BUs should be provided. 

How the Metadata Managment should like:

rathorer_7-1753158582789.png

In this Architecture Design Secure access to data is important key factor. It is required to make sure that no cross access of data across tenants allowed at any cost.

4. Secure Data Processing

In each tenant workspace:

  • Setup ETL/ELT jobs using Databricks Jobs or Workflows

  • Use Delta Live Tables (DLT) for managing CDC/incremental pipelines

  • Enforce row-level security in Delta tables using UDFs + UC GRANTS

5. Security Controls

a. Row-Level Security via SQL UDFs

Create a SQL UDF that filters rows based on the current user’s group or email:
CREATE FUNCTION shared_catalog.security_fn.tenant_row_filter(user_email STRING, tenant_id STRING)
RETURNS BOOLEAN
RETURN user_email IN (
SELECT user_email
FROM shared_catalog.security_mapping
WHERE tenant_id = tenant_id
);
Then apply the filter on table:
ALTER TABLE tenant_a_catalog.silver.orders
SET ROW FILTER shared_catalog.security_fn.tenant_row_filter(user(), tenant_id)
ON (tenant_id);

This ensures:

  • Even if someone queries tenant table from shared context, only matching rows are visible.

  • No need to duplicate logic across tenants.

b. Schema/Column Masking

This is useful when group level access to be provided within single tenant 

ALTER TABLE tenant_a_catalog.gold.customer_info
ALTER COLUMN ssn
SET MASKING POLICY shared_catalog.security_fn.mask_ssn

Security Control – Account Mapping for Access Control

rathorer_10-1753159394385.png

 

6. User Access Management

  • Use SCIM provisioning or APIs to automate user/group creation per tenant workspace

  • Add users to tenant-specific groups:

group_tenant_a_users → Access only tenant_a_catalog
group_tenant_b_users → Access only tenant_b_catalog

Grant permissions:
GRANT USAGE ON CATALOG tenant_a_catalog TO `group_tenant_a_users`;
GRANT SELECT ON TABLE tenant_a_catalog.gold.orders TO `group_tenant_a_users`;

If I need to summarize this overall security process, it follows the steps:

  • Secure Data Processing: Each Teant Workspace Includes:
    • Setup ETL/ELT jobs using Databricks Jobs or Workflows
    • Enforce row-level security in Delta tables using UDFs + UC GRANTS
  • Security Control
    • Row Level Security via UDF
    • Schema/ Column Masking

  • User Access Management by external IDP
  • Use SCIM provisioning or APIs to automate user/group creation per tenant workspace
  • Add users to tenant-specific groups & Grant permissions.

rathorer_9-1753159307147.png

Summary of Isolation Techniques

Layer Technique

StorageFolder-level + Storage Credential isolation
ComputeWorkspace-level isolation
Data AccessUnity Catalog bindings + RBAC
Row-level accessUDF-based row filter
Column maskingData masking policies
AuditingUnity Catalog audit logs

 

Isolation at Metastore level

Approach:

  • Single Metastore per Tenant
  • Create Catalog under Tenant specific Metastore. Isolate Env specific catalogs (minimum 2, prod and lower env as per standard).
  • WS to catalog binding to segrate the processing and access control for different Environment.
  • Security:
  • Storage level access
  • Schema level isolation if required
  • Create User/ group and implement RLC/ CLS

rathorer_11-1753159666798.png

Single and Multi Metastore Comparison Analysis

 

Feature / Concern

Single Metastore (Shared)

Multi-Metastore (One per Tenant)

Metastore Isolation

Shared across all tenants

Fully isolated

Data Isolation

Requires RLS, masking

Natural hard boundary

Workspace Binding

Catalog-level binding

Bound to one workspace

Cross-Tenant Access Risk

Higher risk

Very low risk

Access Control Complexity

High (groups, UDFs)

Lower (per metastore)

Row-Level Security (RLS)

Mandatory

Needed when user group level Access Required

Storage Layer

Bucket Level Isolation

Different Storage Account

Metastore Admins

See all data unless filtered

Per-tenant control

Scalability

Supports 100+ tenants

Practical < 50 tenants

Governance Overhead

High

Lower as Cross Tenant access is not feasible.

Backup / DR

Global for all tenants

Per-tenant plan

Cost Management

Hard to split, would need tag based processing

Per-tenant tracking

Use Case Fit

B2B SaaS, shared analytics

Regulated industries. Cross Tenant Access for Admin is restricted.




0 REPLIES 0