cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to combine medallion architecture and delta live-tables nicely?

Erik
Valued Contributor II

As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the databricks sql endpoint only sees these and not the silver/bronze ones.

I looked into using delta live tables, and are left with three questions:

1: How to create a dlt pipeline which stores the bronze/silver/gold tables to their correct storrage accounts? Or is this not the recommended way anymore, and we should now have everything in the same storrage account?

2: How to best organize the tables into bronze/silver/gold? An illustration is this example from the (quite cool) databricks mosaic project. There are many tables, but the medallion seperation does not seem to be encoded anywhere. Is there any best practice here? Prepend e.g. "bronze_" in front of the table name? Tags?

3: And (related to question 2), is there any best practice in having only the gold tables visible to end-users? Maybe permissions?

1 ACCEPTED SOLUTION

Accepted Solutions

merca
Valued Contributor II

I can answer the first question:

You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct account (if you leave pipeline "storage path" blank, default `dbfs:/pipelines/<pipeline-id>` will be used).

Every delta live table is created in metastore - so schema/table grants should be used to manage permissions per layer. We use schemas to separate layers.

My wish is that the schema would also be part of the table property and overwritten, not only the pipeline setting.

View solution in original post

1 REPLY 1

merca
Valued Contributor II

I can answer the first question:

You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct account (if you leave pipeline "storage path" blank, default `dbfs:/pipelines/<pipeline-id>` will be used).

Every delta live table is created in metastore - so schema/table grants should be used to manage permissions per layer. We use schemas to separate layers.

My wish is that the schema would also be part of the table property and overwritten, not only the pipeline setting.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.