cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How do you structure and storage you medallion architecture ?

William_Scardua
Valued Contributor

Hi guys,

How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing 😃

I think that zones:

1.landing zone, file storage in /landing_zone - databricks database.bronze storage in /bronze_container

2.transformed zone, file storage in /transformation_zone - databricks databse.silver storage in /silver_container

3. insight zone, file storage in /insight_zone - databricks database.gold storage in /gold_container

but I have a question, from transformed zone the data are duplicate (/transformed_zone and /silver_container)

What do you think, what is the best practice ?

Tks

3 REPLIES 3

mmlime
New Contributor III

With lakes and Hive metastore (external tables) I did it same way.

  • landing storage with containers based on the source system/vendor separated from my lake to enable better authorization model to push data by system/vendor
  • bronze container, like staging. dirs by system for delta table of all not transformed data from landing.
  • silver. parsed/cleansed data in delta format, still in system and their objects hierarchy.
  • silver. enhanced data? == my core data model, -> entities (product, customer...)

But the way I see it nowadays:

Do you already use Unity Catalog? Is this still question there? you are more and more forced there to use managed tables. you do not more care about structure of your lake / lakehouse. It is still more and more DDL representation of data like DWH. You create the structure in your Metastore *** UC managed location (it use ids to store your tables in a storage not human readable paths).

So now the question is more how to organize your Metastore ( catalogs, databases, tables) to follow this medallion arch. then how to structure your lake containers/directories ..in my opinion.

-werners-
Esteemed Contributor III

Agree, although I do not like it.

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @William Scardua​ ,

I will highly recommend you to use Delta Live Tables (DLT) for your use case. Please check the docs with sample notebooks here https://docs.databricks.com/workflows/delta-live-tables/index.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group