cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT - runtime parameterisation of execution

MartinIsti
New Contributor III

I have started to use DLT in a prototype framework and I now face the below challenge for which any help would be appreciated.

First let me give a brief context:

  • I have metadata sitting in a .json file that I read as the first task and put it into a log table with all the relevant attributes (including the list of tables to be processed by the DLT pipeline)
  • That log table has multiple records including those of past executions so I have to filter it down to the current one using a timestamp (e.g. IngestAdventureWorks_20240314)
  • For that I need to pass that ID as a parameter to the DLT pipeline so it can be used in a SQL query to find the relevant records and built the list of tables to be processed.
  • When I hardcode it as a Key-Value pair during design-time I can access those values easily using the spark.conf.get("ID",  None) syntax
 
My question/challenge is how to pass that parameter using either a task in a workflow (similarly how I can reference prior tasks' output and pass it to a widget in a downstream notebook task) or execute the DLT pipeline using a notebook.
 
That would be really important for me to make the solution really dynamic without hardcoding parameter values.
 
Thanks for any help in advance
 
Istvรกn
5 REPLIES 5

Thanks Kaniz to your response. It would have been great to use a similar approach like the widgets to a normal notebook. Specifying these parameters at design time does not allow the flexibility needed for running my DLT pipeline truly metadata-driven.

I was also going towards using the job REST API from a notebook but then I ended up tweaking my configuration tables in a way that I can utilise a hardcoded parameter in the DLT definition and still have it dynamic.

If the REST API call functionality could be integrated into the workflows later on to pass these values as to other tasks, that would be really great!

I accept it as a solution because your third suggestion would work. I still keep hoping a more integrated approach will come in the future ๐Ÿ˜‰

Hi @MartinIsti , How did you manage tweaking the metadata to handle dynamically. Can you pls brief it out based on what you told is the below.

"I ended up tweaking my configuration tables in a way that I can utilise a hardcoded parameter in the DLT definition and still have it dynamic."

 

 

Sure, and for the record I'm still not fully happy with how parameters need to be set at design time.

As mentioned, I store the metadata in a .json file that I read using a standard notebook. The content of that I then save into DBFS as a delta table overwriting any previous version. Then the DLT notebook reads from that table and I only need to specify the name of the process (e.g. IngestAdventureWorks) and that name matches the name of the DLT pipeline itself (or it can be derived).

Once I determine which table to read from the DLT pipeline can be driven by the metadata in that table.

I still find dealing with DLTs inconsistent with orchestration of standard notebook-driven data handling, it is an odd-one out that mostly needs a slightly different way of handling but so far I have found a workaround for every of these small inconsistencies.

@MartinIsti thanks for your detailed explanation.

data-engineer-d
Contributor

@Retired_mod Can you please provide some reference to REST API approach? I do not see that available on the docs. 

TIA

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group