Multi-tenant recommendation system (Machine learning)

Kasen — Fri, 06 Sep 2024 04:27:46 GMT

Hello,

I am looking to build a multi-tenant machine learning recommender system in Azure Databricks. The idea is to have a single shared model, where each tenant can use the same model to train on their own unique dataset. Essentially, while the model architecture remains the same for all tenants, the data used for training and inference would be specific to each one. Any resources that I can refer or best practices for implementing such a system? Thank you!

Re: Multi-tenant recommendation system (Machine learning)

Louis_Frolio — Wed, 29 Oct 2025 10:29:40 GMT

@Kasen , sorry for the delayed response. Here are some things to consider regarding your question.

Azure Databricks is well-suited for a shared-architecture, tenant‑isolated recommender system. Below is a pragmatic blueprint, the isolation model options, and concrete best practices with Databricks-native services you can adopt.

Recommended multi-tenant architecture on Azure Databricks

Use Unity Catalog (UC) as the governance backbone with a single metastore per region and isolate tenants at the catalog or schema level (preferred over multiple metastores).
Bind catalogs and storage credentials to specific workspaces if you need environment isolation (e.g., dev vs prod and tenant-specific endpoints) while retaining centralized governance across the region.
Run shared compute safely with Lakeguard to enforce data governance at runtime on multi-user clusters and SQL warehouses; this lets you share cost-efficient compute without relaxing isolation controls.
For cost attribution and noisy-neighbor avoidance, prefer compute-per-tenant (dedicated job clusters or per-tenant serverless concurrency) even if data governance is centralized in UC.

Isolation controls and governance

Use catalog-per-tenant (preferred) or schema-per-tenant in a shared workspace; both patterns give strong isolation with simpler operations than workspace-per-tenant (250 workspace hard limit).
Apply workspace–catalog binding and credential binding to workspaces to constrain where production data is accessible and to segment endpoints and identities per environment or tenant.
Leverage row/column‑level security and ABAC for finer-grained controls where needed; UC supports policy-based filtering and masking across governed tables.

Feature engineering and serving

Use Databricks Feature Store in Unity Catalog to register feature tables and models with governance, lineage, and cross-workspace discovery; training automatically tracks feature lineage, and inference can auto‑lookup features to prevent training/serving skew.
For low-latency online inference, enable Online Feature Stores (Lakebase‑powered) and publish per‑tenant feature tables (latest values or full time series as needed).

Model lifecycle per tenant

Keep a single model architecture (e.g., Two‑Tower retrieval plus DLRM re‑ranking) and register each tenant’s model/version in UC under that tenant’s catalog/schema using MLflow.
For scalable training, use TorchDistributor with Mosaic StreamingDataset (and TorchRec for sharded embeddings) to handle millions of users/items efficiently on multi‑GPU clusters/serverless GPU.
If you’re earlier in the journey, Databricks solution accelerators provide wide‑and‑deep, ALS, market‑basket, image similarity notebooks to bootstrap tenant builds on a common codebase.

Inference, A/B testing, and monitoring

Serve tenant models with Mosaic AI Model Serving. You can either deploy one endpoint per tenant or use a multi‑model endpoint (served_entities) with traffic splitting to route per‑tenant traffic or run challenger vs current for A/B tests.
For high‑QPS/low‑latency tenants, enable route optimization (dedicated URL + OAuth) to reduce overhead latency and raise QPS versus standard endpoints.
Turn on AI Gateway usage tracking and inference tables for each endpoint to log requests/responses to a UC Delta table for evaluation, drift monitoring, and corpus creation for fine‑tuning or re‑rankers.
Apply rate limits (endpoint, user, group) to protect shared capacity across tenants; monitor limits and regions with the Serving limits/regions guide.

Cross-region or cross-organization sharing

Keep one UC metastore per region; share data across regions/orgs with Databricks‑to‑Databricks Delta Sharing (foreign catalogs), noting lineage/ACLs don’t cross the share boundary and must be re‑applied in the recipient.
If you need governed open sharing to external tools (e.g., Power BI), use OIDC federation for Delta Sharing to avoid long‑lived bearer tokens and retain MFA/IdP policy enforcement.

Cost, quotas, and limits

Treat compute as the attribution layer (per‑tenant clusters/concurrency), and use serverless budget policies and tags for granular billing.
Review UC quotas and request increases if needed (e.g., large numbers of catalogs, tables, or models per tenant) with the UC quota SOP.
Check Model Serving limits (QPS, payload, concurrency, compliance) and route optimization requirements when designing endpoints at scale.

External access patterns and guardrails

Avoid external systems writing to the same tables outside Databricks, as UC doesn’t govern direct object‑store writes; use managed tables or explicit external‑volume patterns and credential vending to preserve consistency and security.

Concrete blueprint (step-by-step)

Identity and governance: Provision principals via SCIM at the account, enable UC, create a catalog per tenant, and bind catalogs/credentials to the correct workspaces and environments (dev/stg/prod).
Data ingestion and isolation: Land each tenant’s data into their catalog/schema, applying RLS/CLS or ABAC where needed; use Lakeguard on shared compute clusters to enforce governance at runtime.
Feature engineering: Build tenant feature tables in UC, track lineage, and publish hot features to Online Feature Stores for low-latency inference.
Model training: Use common repos/notebooks with TorchDistributor/Mosaic Streaming for Two‑Tower retrieval and DLRM reranking; register each tenant’s model in UC (same architecture, different weights), tracked by MLflow.
Model serving: Create per-tenant endpoints or multi‑model endpoints with traffic split and route optimization; enable AI Gateway usage tracking, rate limits, and inference tables for monitoring and A/B testing.
Cross-region access (optional): Use D2D Delta Sharing and re‑grant ACLs in the recipient catalog; don’t attempt cross‑region metastore assignment.

Resources to read and use

What is Unity Catalog and Azure UC best practices (metastore per region, isolation at catalog/schema, workspace binding).
Isolation in Multi‑Tenant Applications (catalog/schema vs workspace per tenant; compute-per-tenant guidance).
Unity Catalog Lakeguard overview for multi-user governance on shared compute.
Feature Store in UC and Online Feature Stores (setup, auto feature lookup, online serving patterns).
Model Serving docs: create endpoints, multi‑model traffic splitting, route optimization, usage tracking, inference tables, limits/regions.
Delta Sharing architecture and OIDC federation (cross‑region/org data sharing patterns).
Recommender systems on Databricks: Two‑Tower, DLRM, wide‑and‑deep, ALS, accelerators and blogs.

Hope this helps, Louis.

topic Re: Multi-tenant recommendation system (Machine learning) in Machine Learning