Databricks Community

Pratikmsbsvm · ‎08-05-2025

I am creating a Data Pipeline as shown below.

1. Files from multiple input source is coming to respective folder in bronze layer.

2. Using Databricks to perform Transformation and load transformed data to Azure SQL. also to ADLS Gen2 Silver (not shown in figure).

How to write pyspark code which can handle multiple folder as well multiple files to read and transformed through metadata table.

I want to control execution of code through Metadata table, is there any other way to parameterized it.

also will it be possible to do schema validation with metadata table approach.

Please help.

Pardon me if it sound unrealistic.

Thanks a lot

szymon_dybczak · ‎08-05-2025

It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table.

Take for example following article:

Or this one:

There also exists DLT metada-driven framework that you can try for free:

szymon_dybczak · ‎08-05-2025

It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table.

Take for example following article:

Or this one:

There also exists DLT metada-driven framework that you can try for free:

How to Create Metadata driven Data Pipeline in Databricks