Hi,
I am using a medallion architecture on Azure Data Lake Storage Gen2 with Azure Databricks. Currently, I am storing data in Parquet format (not Delta tables), and I am planning to implement Unity Catalog (UC).
As part of this setup, I understand that catalogs and schemas in UC require external locations. From an architecture and governance perspective, I am considering the following approaches:
Option 1: Single container for entire catalog
One container for the catalog
Separate folders inside the container for bronze, silver, and gold layers
Results in 4 external locations (1 for catalog + 3 for layers)
Data is logically separated (via folders), not physically (via containers)
Option 2: Three containers for layers, catalog within bronze
Separate containers for bronze, silver, and gold
Catalog stored inside the bronze container (in a separate folder)
Results in 4 external locations
Concern: mixes catalog storage with bronze layer, which may not align well with medallion principles
Option 3: Four separate containers
Separate containers for catalog, bronze, silver, and gold
Results in 4 external locations
Provides clear physical separation, but increases IAM and governance overhead
Question:
Which of these approaches is considered best practice from a scalability, governance, and Unity Catalog design perspective? Are there any recommended patterns for structuring storage and external locations when using UC with a medallion architecture?