<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Job tasks monitoring in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156848#M54483</link>
    <description>&lt;P class=""&gt;&lt;SPAN&gt;Hello Community,&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;In short:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a multi-step job consisting of 4 stages. In one of the stages, the work is split into tasks per business unit. For simplicity, let’s assume there are 500 tasks that must be completed before moving to the next step, where the final data save operation is performed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;For debugging and audit purposes, all tasks are stored in a Delta table with statuses such as IN PROGRESS&lt;/SPAN&gt;&lt;SPAN&gt;, READY&lt;/SPAN&gt;&lt;SPAN&gt;, COMPLETED&lt;/SPAN&gt;&lt;SPAN&gt;, etc. Along with the status, we also store uniqueID&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;and runId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;so we can identify which tasks belong to a specific job run.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;The entire workflow is triggered from our backend API using the Databricks SDK. At the end, we use webhooks to notify whether the job completed successfully or failed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;What we would like to achieve is displaying task progress to the user, for example:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;200/500 completed → 40%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;250/500 completed → 50%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;etc.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;SPAN&gt;Since we already maintain a Delta table with task statuses, what would be the best way to communicate this progress back to our backend/frontend layer?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;Are there any native Databricks mechanisms or recommended patterns for this kind of monitoring/progress reporting?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;One important requirement is that we do not want to calculate or expose progress directly within the processing job itself. We would prefer either:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;a separate monitoring job/process, or&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;an existing/native Databricks solution if available.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Thanks a lot in advance for your help and recommendations!&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 13 May 2026 16:25:59 GMT</pubDate>
    <dc:creator>maikel</dc:creator>
    <dc:date>2026-05-13T16:25:59Z</dc:date>
    <item>
      <title>Job tasks monitoring</title>
      <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156848#M54483</link>
      <description>&lt;P class=""&gt;&lt;SPAN&gt;Hello Community,&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;In short:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a multi-step job consisting of 4 stages. In one of the stages, the work is split into tasks per business unit. For simplicity, let’s assume there are 500 tasks that must be completed before moving to the next step, where the final data save operation is performed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;For debugging and audit purposes, all tasks are stored in a Delta table with statuses such as IN PROGRESS&lt;/SPAN&gt;&lt;SPAN&gt;, READY&lt;/SPAN&gt;&lt;SPAN&gt;, COMPLETED&lt;/SPAN&gt;&lt;SPAN&gt;, etc. Along with the status, we also store uniqueID&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;and runId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;so we can identify which tasks belong to a specific job run.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;The entire workflow is triggered from our backend API using the Databricks SDK. At the end, we use webhooks to notify whether the job completed successfully or failed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;What we would like to achieve is displaying task progress to the user, for example:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;200/500 completed → 40%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;250/500 completed → 50%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;etc.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;SPAN&gt;Since we already maintain a Delta table with task statuses, what would be the best way to communicate this progress back to our backend/frontend layer?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;Are there any native Databricks mechanisms or recommended patterns for this kind of monitoring/progress reporting?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;One important requirement is that we do not want to calculate or expose progress directly within the processing job itself. We would prefer either:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;a separate monitoring job/process, or&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;an existing/native Databricks solution if available.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Thanks a lot in advance for your help and recommendations!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2026 16:25:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156848#M54483</guid>
      <dc:creator>maikel</dc:creator>
      <dc:date>2026-05-13T16:25:59Z</dc:date>
    </item>
  </channel>
</rss>

