2 weeks ago
Hi,
I am using a medallion architecture on Azure Data Lake Storage Gen2 with Azure Databricks. Currently, I am storing data in Parquet format (not Delta tables), and I am planning to implement Unity Catalog (UC).
As part of this setup, I understand that catalogs and schemas in UC require external locations. From an architecture and governance perspective, I am considering the following approaches:
Option 1: Single container for entire catalog
One container for the catalog
Separate folders inside the container for bronze, silver, and gold layers
Results in 4 external locations (1 for catalog + 3 for layers)
Data is logically separated (via folders), not physically (via containers)
Option 2: Three containers for layers, catalog within bronze
Separate containers for bronze, silver, and gold
Catalog stored inside the bronze container (in a separate folder)
Results in 4 external locations
Concern: mixes catalog storage with bronze layer, which may not align well with medallion principles
Option 3: Four separate containers
Separate containers for catalog, bronze, silver, and gold
Results in 4 external locations
Provides clear physical separation, but increases IAM and governance overhead
Question:
Which of these approaches is considered best practice from a scalability, governance, and Unity Catalog design perspective? Are there any recommended patterns for structuring storage and external locations when using UC with a medallion architecture?
2 weeks ago
Recommended high‑level pattern
How that maps to your three options
Assumption: you’re talking about customer‑managed ADLS Gen2, and you’ll configure UC catalogs/schemas to use that storage via external locations.
Option 1 – Single container per catalog, folders for bronze/silver/gold
If you follow UC patterns (domain catalogs + medallion schemas + managed tables), Option 1 is generally the best starting point.
Option 2 – Three containers for layers, catalog stored inside bronze container
I’d avoid Option 2; it creates a confusing mixing of concerns.
Option 3 – Separate containers for catalog, bronze, silver, gold
So Option 3 is viable for high‑isolation scenarios, but you can usually simplify it: separate containers per domain/env, not strictly per medallion layer.
Recommendation
Given your description and desire for good scalability and governance:
Summary
2 weeks ago
Recommended high‑level pattern
How that maps to your three options
Assumption: you’re talking about customer‑managed ADLS Gen2, and you’ll configure UC catalogs/schemas to use that storage via external locations.
Option 1 – Single container per catalog, folders for bronze/silver/gold
If you follow UC patterns (domain catalogs + medallion schemas + managed tables), Option 1 is generally the best starting point.
Option 2 – Three containers for layers, catalog stored inside bronze container
I’d avoid Option 2; it creates a confusing mixing of concerns.
Option 3 – Separate containers for catalog, bronze, silver, gold
So Option 3 is viable for high‑isolation scenarios, but you can usually simplify it: separate containers per domain/env, not strictly per medallion layer.
Recommendation
Given your description and desire for good scalability and governance:
Summary
Sunday
Thank you @Lu_Wang_ENB_DBX for detailed explanation. I think I'll go ahead with first approach based on the explanation.
a week ago
Hi,
Option 2 is should be avoided.
The real decision is between Option 1 (simpler) and Option 3 (best practice).
Why OPTION 2 is a NO GO:
This violates separation of concerns:
OPTION 3 (BEST PRACTICE):
Separate Containers for:
Why this is the best approach:
Each layer can have Separate IAM roles & Separate access policies
Example:
You can map external locations like:
Then assign permissions as follows:
Option 1 is good but not the ideal one:
Sunday
I was going to follow 3rd but then it violets our medallion. And we don't have that much data to separate it physically. So going with 1st approach. But Thank you very much @karthickrs, I'll keep this in mind 🙂