cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How do you structure and storage you medallion architecture ?

William_Scardua
Valued Contributor

Hi guys,

How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing 😃

I think that zones:

1.landing zone, file storage in /landing_zone - databricks database.bronze storage in /bronze_container

2.transformed zone, file storage in /transformation_zone - databricks databse.silver storage in /silver_container

3. insight zone, file storage in /insight_zone - databricks database.gold storage in /gold_container

but I have a question, from transformed zone the data are duplicate (/transformed_zone and /silver_container)

What do you think, what is the best practice ?

Tks

4 REPLIES 4

mmlime
New Contributor III

With lakes and Hive metastore (external tables) I did it same way.

  • landing storage with containers based on the source system/vendor separated from my lake to enable better authorization model to push data by system/vendor
  • bronze container, like staging. dirs by system for delta table of all not transformed data from landing.
  • silver. parsed/cleansed data in delta format, still in system and their objects hierarchy.
  • silver. enhanced data? == my core data model, -> entities (product, customer...)

But the way I see it nowadays:

Do you already use Unity Catalog? Is this still question there? you are more and more forced there to use managed tables. you do not more care about structure of your lake / lakehouse. It is still more and more DDL representation of data like DWH. You create the structure in your Metastore *** UC managed location (it use ids to store your tables in a storage not human readable paths).

So now the question is more how to organize your Metastore ( catalogs, databases, tables) to follow this medallion arch. then how to structure your lake containers/directories ..in my opinion.

-werners-
Esteemed Contributor III

Agree, although I do not like it.

jose_gonzalez
Moderator
Moderator

Hi @William Scardua​ ,

I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

Kaniz
Community Manager
Community Manager

Hi @William Scardua​ ​, We haven’t heard from you since the last response from @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.