<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks Workflow design in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107559#M42833</link>
    <description>&lt;P&gt;I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to move to a next stage where I want to club all these dlt pipelines together in a single workflow. I can simply create one task for each dlt pipeline. But I am not able to figure our how to schedule it daily and hourly.&lt;/P&gt;&lt;P&gt;Let's say I will schedule the workflow hourly and and according to some set of conditions whether daily / hourly. I can get that from metadata table. I can trigger only specific tasks.&amp;nbsp; But I am not sure how to do it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What will be the right approach for it. If I want to keep all the dlt pipelines in a single workflow where some are of daily schedule and some hourly. And these schedules are not fixed in databricks they are triggered from outside according to the time.&lt;/P&gt;</description>
    <pubDate>Wed, 29 Jan 2025 11:06:52 GMT</pubDate>
    <dc:creator>ashraf1395</dc:creator>
    <dc:date>2025-01-29T11:06:52Z</dc:date>
    <item>
      <title>Databricks Workflow design</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107559#M42833</link>
      <description>&lt;P&gt;I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to move to a next stage where I want to club all these dlt pipelines together in a single workflow. I can simply create one task for each dlt pipeline. But I am not able to figure our how to schedule it daily and hourly.&lt;/P&gt;&lt;P&gt;Let's say I will schedule the workflow hourly and and according to some set of conditions whether daily / hourly. I can get that from metadata table. I can trigger only specific tasks.&amp;nbsp; But I am not sure how to do it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What will be the right approach for it. If I want to keep all the dlt pipelines in a single workflow where some are of daily schedule and some hourly. And these schedules are not fixed in databricks they are triggered from outside according to the time.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 11:06:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107559#M42833</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-01-29T11:06:52Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Workflow design</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107564#M42837</link>
      <description>&lt;P&gt;Hi there, Thanks I understood the approach. But how to implement is what I am not able to figure out.&lt;BR /&gt;If can give a small demo example. It will be really helpful&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 11:29:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107564#M42837</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-01-29T11:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Workflow design</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107566#M42838</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Hello, thank you for your question!&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;Here’s a general approach to achieve this, but please let us know if the requirement understanding does not align:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Create a Parent Workflow with a Single Scheduled Trigger:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Schedule the workflow to run hourly since that is the more frequent batch type.&lt;/LI&gt;
&lt;LI&gt;Use a master task that queries the metadata table to determine which DLT pipelines should run in that execution.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Use a Conditional Execution Mechanism:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Add a notebook task as the first step in the workflow that:&lt;/SPAN&gt;
&lt;UL class="_1t7bu9h8 _1t7bu9h2"&gt;
&lt;LI&gt;Reads the metadata table (which contains schedule information).&lt;/LI&gt;
&lt;LI&gt;Determines if the run is hourly or daily based on the current timestamp.&lt;/LI&gt;
&lt;LI&gt;Sets workflow variables or &lt;CODE&gt;dbutils.jobs.taskValues()&lt;/CODE&gt; for downstream task execution.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Configure Dynamic Task Execution:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Define one task per DLT pipeline in the workflow.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Use conditional execution (Run if condition is met) to ensure that:&lt;/SPAN&gt;
&lt;UL class="_1t7bu9h8 _1t7bu9h2"&gt;
&lt;LI&gt;Hourly pipelines run on every execution.&lt;/LI&gt;
&lt;LI&gt;Daily pipelines run only when the master task determines it's a daily run&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Use &lt;CODE&gt;dbutils.jobs.taskValues()&lt;/CODE&gt; to Control Execution:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;In your master task, set a value like:&amp;nbsp;dbutils.jobs.taskValues.set("run_daily_pipelines", "true")&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;Then, configure each pipeline task with &lt;CODE&gt;Depends on&lt;/CODE&gt; the master task and set execution conditions based on the variable.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 class="_1jeaq5e0 _1t7bu9h9 heading3"&gt;Alternative Approach: Two Separate Workflows&lt;/H3&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;If this is still not having enough flexible conditional execution for your needs, consider:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A daily workflow (triggered once per day).&lt;/LI&gt;
&lt;LI&gt;An hourly workflow (triggered every hour).&lt;/LI&gt;
&lt;LI&gt;Both workflows query the metadata table and only trigger relevant DLT pipelines.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;Please let me know if you're question was meant to be more specifically addressed, and/or if the above needs further clarification. In the meantime, hope it helps!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 11:41:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107566#M42838</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2025-01-29T11:41:30Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Workflow design</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107569#M42840</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;, I got the idea. There will be a small change in the way, we will use it. Since we don't schedule the workflow in databricks we trigger it using the API. So I will pass a job parameter along with the trigger according to the timestamp whether it is a daily or hourly run and then inside the workflow I will handle it.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Got the idea. Will circle back. If any help will be required&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jan 2025 11:53:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-workflow-design/m-p/107569#M42840</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-01-29T11:53:41Z</dc:date>
    </item>
  </channel>
</rss>

