cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Create Metadata driven Data Pipeline in Databricks

Pratikmsbsvm
Contributor

I am creating a Data Pipeline as shown below.

Pratikmsbsvm_0-1754408926145.png

1. Files from multiple input source is coming to respective folder in bronze layer.

2. Using Databricks to perform Transformation and load transformed data to Azure SQL. also to ADLS Gen2 Silver (not shown in figure).

How to write pyspark code which can handle multiple folder as well multiple files to read and transformed through metadata table.

I want to control execution of code through Metadata table, is there any other way to parameterized it.

also will it be possible to do schema validation with metadata table approach.

Please help. 

Pardon me if it sound unrealistic.

Thanks a lot 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @Pratikmsbsvm ,

It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table. 

Take for example following article: 

https://medium.com/dbsql-sme-engineering/a-primer-for-metadata-driven-frameworks-with-databricks-wor...

Or this one: 

https://community.databricks.com/t5/technical-blog/metadata-driven-etl-framework-in-databricks-part-...

There also exists DLT metada-driven framework that you can try for free: 

https://github.com/databrickslabs/dlt-meta

View solution in original post

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @Pratikmsbsvm ,

It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table. 

Take for example following article: 

https://medium.com/dbsql-sme-engineering/a-primer-for-metadata-driven-frameworks-with-databricks-wor...

Or this one: 

https://community.databricks.com/t5/technical-blog/metadata-driven-etl-framework-in-databricks-part-...

There also exists DLT metada-driven framework that you can try for free: 

https://github.com/databrickslabs/dlt-meta