cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parametrize the DLT pipeline for dynamic loading of many tables

data-engineer-d
Contributor

I am trying to ingest hundreds of tables with CDC, where I want to create a generic/dynamic pipeline which can accept parameters (e.g table_name, schema, file path) and run the logic on it. However, I am not able to find a way to pass parameters to pipeline. 

PS: I am aware of using the metadata table to iterate however I might trigger the pipeline using the REST API by passing parameters to it.  
Also, I cannot use pipeline configurations for this as they can't be passed dynamically before triggering execution.

3 REPLIES 3

Gilg
Contributor II

If you have different folders for each of your source tables, you can leverage python loops to naturally iterate over the folders.

To do this, you need to create a create_pipeline function that has table_name, schema, path as your parameters. Inside this function, you have your DLT function that creates your raw or bronze table which your parameter will be use.

You can then simply call the main function and loop it for each folder you have in your path using dbutils.

for folder in dbutils.fs.ls("<your path>"):
  table_name = folder.name[:-1]
  create_pipeline(table_name, schema, path)




Hi @Gilg , Thank you for your response. 
However, as I am working on Unity catalog this solution might not be suitable for me, also the plan is to use another orchestrator to trigger jobs so parameters need to come separately. 

 

the method that I mentioned will definitely work with workspaces that has UC enabled as I am doing the same.

Also I think I misinterpret what you mean by schema, are you talking about catalog schema or data schema? If catalog schema then you just need to remove it from the functions and then create a separate pipeline for tables in a different catalog schema.

not sure what orchestration tool you are planning to use but this will work using Databricks workflow and ADF. You can even built your own ETL framwork using Databricks. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group