@Kasen , sorry for the delayed response. Here are some things to consider regarding your question.
Azure Databricks is well-suited for a shared-architecture, tenantโisolated recommender system. Below is a pragmatic blueprint, the isolation model options, and concrete best practices with Databricks-native services you can adopt.
Recommended multi-tenant architecture on Azure Databricks
- Use Unity Catalog (UC) as the governance backbone with a single metastore per region and isolate tenants at the catalog or schema level (preferred over multiple metastores).
-
Bind catalogs and storage credentials to specific workspaces if you need environment isolation (e.g., dev vs prod and tenant-specific endpoints) while retaining centralized governance across the region.
-
Run shared compute safely with Lakeguard to enforce data governance at runtime on multi-user clusters and SQL warehouses; this lets you share cost-efficient compute without relaxing isolation controls.
-
For cost attribution and noisy-neighbor avoidance, prefer compute-per-tenant (dedicated job clusters or per-tenant serverless concurrency) even if data governance is centralized in UC.
Isolation controls and governance
- Use catalog-per-tenant (preferred) or schema-per-tenant in a shared workspace; both patterns give strong isolation with simpler operations than workspace-per-tenant (250 workspace hard limit).
-
Apply workspaceโcatalog binding and credential binding to workspaces to constrain where production data is accessible and to segment endpoints and identities per environment or tenant.
-
Leverage row/columnโlevel security and ABAC for finer-grained controls where needed; UC supports policy-based filtering and masking across governed tables.
Feature engineering and serving
- Use Databricks Feature Store in Unity Catalog to register feature tables and models with governance, lineage, and cross-workspace discovery; training automatically tracks feature lineage, and inference can autoโlookup features to prevent training/serving skew.
- For low-latency online inference, enable Online Feature Stores (Lakebaseโpowered) and publish perโtenant feature tables (latest values or full time series as needed).
Model lifecycle per tenant
- Keep a single model architecture (e.g., TwoโTower retrieval plus DLRM reโranking) and register each tenantโs model/version in UC under that tenantโs catalog/schema using MLflow.
-
For scalable training, use TorchDistributor with Mosaic StreamingDataset (and TorchRec for sharded embeddings) to handle millions of users/items efficiently on multiโGPU clusters/serverless GPU.
-
If youโre earlier in the journey, Databricks solution accelerators provide wideโandโdeep, ALS, marketโbasket, image similarity notebooks to bootstrap tenant builds on a common codebase.
Inference, A/B testing, and monitoring
- Serve tenant models with Mosaic AI Model Serving. You can either deploy one endpoint per tenant or use a multiโmodel endpoint (served_entities) with traffic splitting to route perโtenant traffic or run challenger vs current for A/B tests.
-
For highโQPS/lowโlatency tenants, enable route optimization (dedicated URL + OAuth) to reduce overhead latency and raise QPS versus standard endpoints.
-
Turn on AI Gateway usage tracking and inference tables for each endpoint to log requests/responses to a UC Delta table for evaluation, drift monitoring, and corpus creation for fineโtuning or reโrankers.
-
Apply rate limits (endpoint, user, group) to protect shared capacity across tenants; monitor limits and regions with the Serving limits/regions guide.
Cross-region or cross-organization sharing
- Keep one UC metastore per region; share data across regions/orgs with DatabricksโtoโDatabricks Delta Sharing (foreign catalogs), noting lineage/ACLs donโt cross the share boundary and must be reโapplied in the recipient.
- If you need governed open sharing to external tools (e.g., Power BI), use OIDC federation for Delta Sharing to avoid longโlived bearer tokens and retain MFA/IdP policy enforcement.
Cost, quotas, and limits
- Treat compute as the attribution layer (perโtenant clusters/concurrency), and use serverless budget policies and tags for granular billing.
-
Review UC quotas and request increases if needed (e.g., large numbers of catalogs, tables, or models per tenant) with the UC quota SOP.
-
Check Model Serving limits (QPS, payload, concurrency, compliance) and route optimization requirements when designing endpoints at scale.
External access patterns and guardrails
- Avoid external systems writing to the same tables outside Databricks, as UC doesnโt govern direct objectโstore writes; use managed tables or explicit externalโvolume patterns and credential vending to preserve consistency and security.
Concrete blueprint (step-by-step)
- Identity and governance: Provision principals via SCIM at the account, enable UC, create a catalog per tenant, and bind catalogs/credentials to the correct workspaces and environments (dev/stg/prod).
-
Data ingestion and isolation: Land each tenantโs data into their catalog/schema, applying RLS/CLS or ABAC where needed; use Lakeguard on shared compute clusters to enforce governance at runtime.
-
Feature engineering: Build tenant feature tables in UC, track lineage, and publish hot features to Online Feature Stores for low-latency inference.
-
Model training: Use common repos/notebooks with TorchDistributor/Mosaic Streaming for TwoโTower retrieval and DLRM reranking; register each tenantโs model in UC (same architecture, different weights), tracked by MLflow.
-
Model serving: Create per-tenant endpoints or multiโmodel endpoints with traffic split and route optimization; enable AI Gateway usage tracking, rate limits, and inference tables for monitoring and A/B testing.
-
Cross-region access (optional): Use D2D Delta Sharing and reโgrant ACLs in the recipient catalog; donโt attempt crossโregion metastore assignment.
Resources to read and use
- What is Unity Catalog and Azure UC best practices (metastore per region, isolation at catalog/schema, workspace binding).
-
Isolation in MultiโTenant Applications (catalog/schema vs workspace per tenant; compute-per-tenant guidance).
-
Unity Catalog Lakeguard overview for multi-user governance on shared compute.
-
Feature Store in UC and Online Feature Stores (setup, auto feature lookup, online serving patterns).
-
Model Serving docs: create endpoints, multiโmodel traffic splitting, route optimization, usage tracking, inference tables, limits/regions.
-
Delta Sharing architecture and OIDC federation (crossโregion/org data sharing patterns).
-
Recommender systems on Databricks: TwoโTower, DLRM, wideโandโdeep, ALS, accelerators and blogs.
Hope this helps, Louis.