DLT - runtime parameterisation of execution
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 12:24 PM
I have started to use DLT in a prototype framework and I now face the below challenge for which any help would be appreciated.
First let me give a brief context:
- I have metadata sitting in a .json file that I read as the first task and put it into a log table with all the relevant attributes (including the list of tables to be processed by the DLT pipeline)
- That log table has multiple records including those of past executions so I have to filter it down to the current one using a timestamp (e.g. IngestAdventureWorks_20240314)
- For that I need to pass that ID as a parameter to the DLT pipeline so it can be used in a SQL query to find the relevant records and built the list of tables to be processed.
- When I hardcode it as a Key-Value pair during design-time I can access those values easily using the spark.conf.get("ID", None) syntax
My question/challenge is how to pass that parameter using either a task in a workflow (similarly how I can reference prior tasks' output and pass it to a widget in a downstream notebook task) or execute the DLT pipeline using a notebook.
That would be really important for me to make the solution really dynamic without hardcoding parameter values.
Thanks for any help in advance
István