Parametrize the DLT pipeline for dynamic loading of many tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2024 10:53 AM
I am trying to ingest hundreds of tables with CDC, where I want to create a generic/dynamic pipeline which can accept parameters (e.g table_name, schema, file path) and run the logic on it. However, I am not able to find a way to pass parameters to pipeline.
PS: I am aware of using the metadata table to iterate however I might trigger the pipeline using the REST API by passing parameters to it.
Also, I cannot use pipeline configurations for this as they can't be passed dynamically before triggering execution.
- Labels:
-
Delta Lake
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2024 03:32 PM
If you have different folders for each of your source tables, you can leverage python loops to naturally iterate over the folders.
To do this, you need to create a create_pipeline function that has table_name, schema, path as your parameters. Inside this function, you have your DLT function that creates your raw or bronze table which your parameter will be use.
You can then simply call the main function and loop it for each folder you have in your path using dbutils.
for folder in dbutils.fs.ls("<your path>"):
table_name = folder.name[:-1]
create_pipeline(table_name, schema, path)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2024 04:14 AM
Hi @Gilg , Thank you for your response.
However, as I am working on Unity catalog this solution might not be suitable for me, also the plan is to use another orchestrator to trigger jobs so parameters need to come separately.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2024 06:30 AM
the method that I mentioned will definitely work with workspaces that has UC enabled as I am doing the same.
Also I think I misinterpret what you mean by schema, are you talking about catalog schema or data schema? If catalog schema then you just need to remove it from the functions and then create a separate pipeline for tables in a different catalog schema.
not sure what orchestration tool you are planning to use but this will work using Databricks workflow and ADF. You can even built your own ETL framwork using Databricks.

