Databricks Workflow design
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type.
I want to move to a next stage where I want to club all these dlt pipelines together in a single workflow. I can simply create one task for each dlt pipeline. But I am not able to figure our how to schedule it daily and hourly.
Let's say I will schedule the workflow hourly and and according to some set of conditions whether daily / hourly. I can get that from metadata table. I can trigger only specific tasks. But I am not sure how to do it.
What will be the right approach for it. If I want to keep all the dlt pipelines in a single workflow where some are of daily schedule and some hourly. And these schedules are not fixed in databricks they are triggered from outside according to the time.
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi there, Thanks I understood the approach. But how to implement is what I am not able to figure out.
If can give a small demo example. It will be really helpful
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hello, thank you for your question!
Here’s a general approach to achieve this, but please let us know if the requirement understanding does not align:
-
Create a Parent Workflow with a Single Scheduled Trigger:
- Schedule the workflow to run hourly since that is the more frequent batch type.
- Use a master task that queries the metadata table to determine which DLT pipelines should run in that execution.
-
Use a Conditional Execution Mechanism:
- Add a notebook task as the first step in the workflow that:
- Reads the metadata table (which contains schedule information).
- Determines if the run is hourly or daily based on the current timestamp.
- Sets workflow variables or
dbutils.jobs.taskValues()
for downstream task execution.
- Add a notebook task as the first step in the workflow that:
-
Configure Dynamic Task Execution:
- Define one task per DLT pipeline in the workflow.
- Use conditional execution (Run if condition is met) to ensure that:
- Hourly pipelines run on every execution.
- Daily pipelines run only when the master task determines it's a daily run
-
Use
dbutils.jobs.taskValues()
to Control Execution:-
In your master task, set a value like: dbutils.jobs.taskValues.set("run_daily_pipelines", "true")
-
-
Then, configure each pipeline task with
Depends on
the master task and set execution conditions based on the variable.
Alternative Approach: Two Separate Workflows
If this is still not having enough flexible conditional execution for your needs, consider:
- A daily workflow (triggered once per day).
- An hourly workflow (triggered every hour).
- Both workflows query the metadata table and only trigger relevant DLT pipelines.
Please let me know if you're question was meant to be more specifically addressed, and/or if the above needs further clarification. In the meantime, hope it helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi @VZLA , I got the idea. There will be a small change in the way, we will use it. Since we don't schedule the workflow in databricks we trigger it using the API. So I will pass a job parameter along with the trigger according to the timestamp whether it is a daily or hourly run and then inside the workflow I will handle it.
Got the idea. Will circle back. If any help will be required
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)