cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to combine medallion architecture and delta live-tables nicely?

Erik
Valued Contributor III

As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the databricks sql endpoint only sees these and not the silver/bronze ones.

I looked into using delta live tables, and are left with three questions:

1: How to create a dlt pipeline which stores the bronze/silver/gold tables to their correct storrage accounts? Or is this not the recommended way anymore, and we should now have everything in the same storrage account?

2: How to best organize the tables into bronze/silver/gold? An illustration is this example from the (quite cool) databricks mosaic project. There are many tables, but the medallion seperation does not seem to be encoded anywhere. Is there any best practice here? Prepend e.g. "bronze_" in front of the table name? Tags?

3: And (related to question 2), is there any best practice in having only the gold tables visible to end-users? Maybe permissions?

1 ACCEPTED SOLUTION

Accepted Solutions

merca
Valued Contributor II

I can answer the first question:

You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct account (if you leave pipeline "storage path" blank, default `dbfs:/pipelines/<pipeline-id>` will be used).

Every delta live table is created in metastore - so schema/table grants should be used to manage permissions per layer. We use schemas to separate layers.

My wish is that the schema would also be part of the table property and overwritten, not only the pipeline setting.

View solution in original post

1 REPLY 1

merca
Valued Contributor II

I can answer the first question:

You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct account (if you leave pipeline "storage path" blank, default `dbfs:/pipelines/<pipeline-id>` will be used).

Every delta live table is created in metastore - so schema/table grants should be used to manage permissions per layer. We use schemas to separate layers.

My wish is that the schema would also be part of the table property and overwritten, not only the pipeline setting.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group