cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parametrize the DLT pipeline for dynamic loading of many tables

data-engineer-d
Contributor

I am trying to ingest hundreds of tables with CDC, where I want to create a generic/dynamic pipeline which can accept parameters (e.g table_name, schema, file path) and run the logic on it. However, I am not able to find a way to pass parameters to pipeline. 

PS: I am aware of using the metadata table to iterate however I might trigger the pipeline using the REST API by passing parameters to it.  
Also, I cannot use pipeline configurations for this as they can't be passed dynamically before triggering execution.

3 REPLIES 3

Gilg
Contributor II

If you have different folders for each of your source tables, you can leverage python loops to naturally iterate over the folders.

To do this, you need to create a create_pipeline function that has table_name, schema, path as your parameters. Inside this function, you have your DLT function that creates your raw or bronze table which your parameter will be use.

You can then simply call the main function and loop it for each folder you have in your path using dbutils.

for folder in dbutils.fs.ls("<your path>"):
  table_name = folder.name[:-1]
  create_pipeline(table_name, schema, path)




Hi @Gilg , Thank you for your response. 
However, as I am working on Unity catalog this solution might not be suitable for me, also the plan is to use another orchestrator to trigger jobs so parameters need to come separately. 

 

the method that I mentioned will definitely work with workspaces that has UC enabled as I am doing the same.

Also I think I misinterpret what you mean by schema, are you talking about catalog schema or data schema? If catalog schema then you just need to remove it from the functions and then create a separate pipeline for tables in a different catalog schema.

not sure what orchestration tool you are planning to use but this will work using Databricks workflow and ADF. You can even built your own ETL framwork using Databricks. 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!