cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Parametrize the DLT pipeline for dynamic loading of many tables

data-engineer-d
New Contributor III

I am trying to ingest hundreds of tables with CDC, where I want to create a generic/dynamic pipeline which can accept parameters (e.g table_name, schema, file path) and run the logic on it. However, I am not able to find a way to pass parameters to pipeline. 

PS: I am aware of using the metadata table to iterate however I might trigger the pipeline using the REST API by passing parameters to it.  
Also, I cannot use pipeline configurations for this as they can't be passed dynamically before triggering execution.

3 REPLIES 3

Gilg
Contributor II

If you have different folders for each of your source tables, you can leverage python loops to naturally iterate over the folders.

To do this, you need to create a create_pipeline function that has table_name, schema, path as your parameters. Inside this function, you have your DLT function that creates your raw or bronze table which your parameter will be use.

You can then simply call the main function and loop it for each folder you have in your path using dbutils.

for folder in dbutils.fs.ls("<your path>"):
  table_name = folder.name[:-1]
  create_pipeline(table_name, schema, path)




data-engineer-d
New Contributor III

Hi @Gilg , Thank you for your response. 
However, as I am working on Unity catalog this solution might not be suitable for me, also the plan is to use another orchestrator to trigger jobs so parameters need to come separately. 

 

the method that I mentioned will definitely work with workspaces that has UC enabled as I am doing the same.

Also I think I misinterpret what you mean by schema, are you talking about catalog schema or data schema? If catalog schema then you just need to remove it from the functions and then create a separate pipeline for tables in a different catalog schema.

not sure what orchestration tool you are planning to use but this will work using Databricks workflow and ADF. You can even built your own ETL framwork using Databricks. 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.