How do you structure and storage you medallion architecture ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-21-2022 03:29 PM
Hi guys,
How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing ๐
I think that zones:
1.landing zone, file storage in /landing_zone - databricks database.bronze storage in /bronze_container
2.transformed zone, file storage in /transformation_zone - databricks databse.silver storage in /silver_container
3. insight zone, file storage in /insight_zone - databricks database.gold storage in /gold_container
but I have a question, from transformed zone the data are duplicate (/transformed_zone and /silver_container)
What do you think, what is the best practice ?
Tks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-22-2022 01:53 AM
With lakes and Hive metastore (external tables) I did it same way.
- landing storage with containers based on the source system/vendor separated from my lake to enable better authorization model to push data by system/vendor
- bronze container, like staging. dirs by system for delta table of all not transformed data from landing.
- silver. parsed/cleansed data in delta format, still in system and their objects hierarchy.
- silver. enhanced data? == my core data model, -> entities (product, customer...)
But the way I see it nowadays:
Do you already use Unity Catalog? Is this still question there? you are more and more forced there to use managed tables. you do not more care about structure of your lake / lakehouse. It is still more and more DDL representation of data like DWH. You create the structure in your Metastore *** UC managed location (it use ids to store your tables in a storage not human readable paths).
So now the question is more how to organize your Metastore ( catalogs, databases, tables) to follow this medallion arch. then how to structure your lake containers/directories ..in my opinion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-24-2022 04:17 AM
Agree, although I do not like it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ10-24-2022 11:13 AM
Hi @William Scarduaโ ,
I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

