<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Job tasks monitoring in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156980#M54496</link>
    <description>&lt;P&gt;your &lt;STRONG&gt;Delta status table is the right source of truth&lt;/STRONG&gt;. I would not rely on Databricks job webhooks for incremental progress; they are mainly for start/success/failure/duration events, not “200/500 completed” style progress.&lt;/P&gt;&lt;P&gt;Pattern:Backend API starts Databricks job-&amp;gt;Job writes task-level status to Delta table-&amp;gt;Separate lightweight monitor reads Delta by runId-&amp;gt;Monitor writes progress snapshot / sends API update-&amp;gt;Backend exposes progress to frontend.&lt;/P&gt;&lt;P&gt;Option 1 -Have backend call a query every few seconds&lt;/P&gt;&lt;P&gt;Option 2 — Separate monitoring job&lt;BR /&gt;Create a small Databricks job or external service that runs independently from the processing job. It polls the Delta table by runId, calculates progress, and posts updates to your backend endpoint.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Option 3:Delta progress snapshot table&lt;/P&gt;&lt;P&gt;Instead of frontend/backend querying the detailed 500-row status table repeatedly, create a compact table:&lt;/P&gt;&lt;P&gt;run_id&lt;BR /&gt;total_tasks&lt;BR /&gt;completed_tasks&lt;BR /&gt;failed_tasks&lt;BR /&gt;in_progress_tasks&lt;BR /&gt;ready_tasks&lt;BR /&gt;progress_pct&lt;BR /&gt;last_updated_ts&lt;/P&gt;&lt;P&gt;Then backend reads only one row per run. This is more scalable and API-friendly.&lt;/P&gt;&lt;P&gt;My recommendation: Option 3 + backend polling. Use the Delta status table as source of truth, maintain a compact progress snapshot table, and let frontend poll your backend endpoint like:&lt;/P&gt;&lt;P&gt;GET /jobs/{runId}/progress&lt;/P&gt;</description>
    <pubDate>Fri, 15 May 2026 11:53:05 GMT</pubDate>
    <dc:creator>rdokala</dc:creator>
    <dc:date>2026-05-15T11:53:05Z</dc:date>
    <item>
      <title>Job tasks monitoring</title>
      <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156848#M54483</link>
      <description>&lt;P class=""&gt;&lt;SPAN&gt;Hello Community,&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;In short:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;We have a multi-step job consisting of 4 stages. In one of the stages, the work is split into tasks per business unit. For simplicity, let’s assume there are 500 tasks that must be completed before moving to the next step, where the final data save operation is performed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;For debugging and audit purposes, all tasks are stored in a Delta table with statuses such as IN PROGRESS&lt;/SPAN&gt;&lt;SPAN&gt;, READY&lt;/SPAN&gt;&lt;SPAN&gt;, COMPLETED&lt;/SPAN&gt;&lt;SPAN&gt;, etc. Along with the status, we also store uniqueID&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;and runId&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;so we can identify which tasks belong to a specific job run.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;The entire workflow is triggered from our backend API using the Databricks SDK. At the end, we use webhooks to notify whether the job completed successfully or failed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;What we would like to achieve is displaying task progress to the user, for example:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;200/500 completed → 40%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;250/500 completed → 50%&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;etc.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;SPAN&gt;Since we already maintain a Delta table with task statuses, what would be the best way to communicate this progress back to our backend/frontend layer?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;Are there any native Databricks mechanisms or recommended patterns for this kind of monitoring/progress reporting?&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN&gt;One important requirement is that we do not want to calculate or expose progress directly within the processing job itself. We would prefer either:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;a separate monitoring job/process, or&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;an existing/native Databricks solution if available.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Thanks a lot in advance for your help and recommendations!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 May 2026 16:25:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156848#M54483</guid>
      <dc:creator>maikel</dc:creator>
      <dc:date>2026-05-13T16:25:59Z</dc:date>
    </item>
    <item>
      <title>Re: Job tasks monitoring</title>
      <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156980#M54496</link>
      <description>&lt;P&gt;your &lt;STRONG&gt;Delta status table is the right source of truth&lt;/STRONG&gt;. I would not rely on Databricks job webhooks for incremental progress; they are mainly for start/success/failure/duration events, not “200/500 completed” style progress.&lt;/P&gt;&lt;P&gt;Pattern:Backend API starts Databricks job-&amp;gt;Job writes task-level status to Delta table-&amp;gt;Separate lightweight monitor reads Delta by runId-&amp;gt;Monitor writes progress snapshot / sends API update-&amp;gt;Backend exposes progress to frontend.&lt;/P&gt;&lt;P&gt;Option 1 -Have backend call a query every few seconds&lt;/P&gt;&lt;P&gt;Option 2 — Separate monitoring job&lt;BR /&gt;Create a small Databricks job or external service that runs independently from the processing job. It polls the Delta table by runId, calculates progress, and posts updates to your backend endpoint.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Option 3:Delta progress snapshot table&lt;/P&gt;&lt;P&gt;Instead of frontend/backend querying the detailed 500-row status table repeatedly, create a compact table:&lt;/P&gt;&lt;P&gt;run_id&lt;BR /&gt;total_tasks&lt;BR /&gt;completed_tasks&lt;BR /&gt;failed_tasks&lt;BR /&gt;in_progress_tasks&lt;BR /&gt;ready_tasks&lt;BR /&gt;progress_pct&lt;BR /&gt;last_updated_ts&lt;/P&gt;&lt;P&gt;Then backend reads only one row per run. This is more scalable and API-friendly.&lt;/P&gt;&lt;P&gt;My recommendation: Option 3 + backend polling. Use the Delta status table as source of truth, maintain a compact progress snapshot table, and let frontend poll your backend endpoint like:&lt;/P&gt;&lt;P&gt;GET /jobs/{runId}/progress&lt;/P&gt;</description>
      <pubDate>Fri, 15 May 2026 11:53:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/156980#M54496</guid>
      <dc:creator>rdokala</dc:creator>
      <dc:date>2026-05-15T11:53:05Z</dc:date>
    </item>
    <item>
      <title>Re: Job tasks monitoring</title>
      <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/157209#M54516</link>
      <description>&lt;P&gt;I don't think there is anything native for this in Databricks. The closest match would have been system tables (system.lakeflow.job_run_timeline / job_task_run_timeline) but I don't think it will have the necessary grain for what your pattern.&lt;/P&gt;
&lt;P&gt;There's probably two different ways to try and think about it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Approach 1:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;Enable Change Data Feed on your status Delta table: &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;ALTER&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;TABLE&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;…&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;SET&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;TBLPROPERTIES&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;(delta.enableChangeDataFeed&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;true)&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;Create a Lakebase Postgres instance and a &lt;STRONG&gt;synced&lt;/STRONG&gt; &lt;STRONG&gt;table&lt;/STRONG&gt; in &lt;STRONG&gt;Continuous&lt;/STRONG&gt; &lt;STRONG&gt;mode&lt;/STRONG&gt; — it replicates the Delta table to Postgres with a minimum refresh interval of ~15 seconds.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;Your backend API queries Postgres directly with a plain &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;COUNT(*)&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;FILTER&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;(WHERE&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;status='COMPLETED'),&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;COUNT(*)&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;FROM&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;tasks&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;WHERE&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;run_id&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;=&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;?&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; — millisecond latency, no Databricks SQL warehouse spin-up cost per request, and no progress&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;logic in the processing job. Lakebase supports up to 1,000 concurrent connections, so you can poll from the frontend safely.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;Bonus: same Lakebase instance can back other operational lookups for your app.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;&lt;STRONG&gt;Approach 2:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp;&lt;/SPAN&gt;Point your backend at a small serverless SQL warehouse and call the Statement Execution API (&lt;A href="https://docs.databricks.com/api/workspace/statementexecution" target="_blank"&gt;https://docs.databricks.com/api/workspace/statementexecution&lt;/A&gt;) with a parameterized aggregate query keyed by &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;run_id&lt;/SPAN&gt;&lt;SPAN class="s1"&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp;&lt;/SPAN&gt;Cache results in your backend for a few seconds to avoid hammering the warehouse.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class="p1"&gt;&lt;SPAN class="s1"&gt;&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp;&lt;/SPAN&gt;Trade-off: serverless warehouse cold-starts and per-query latency are higher than Postgres; fine for a progress bar polled every 3–10s, less ideal if you need sub-second updates.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;you can continue using your existing job-success/fail webhook for the terminal signal. Use approaches above only for the in-flight &lt;/SPAN&gt;&lt;SPAN class="s2"&gt;200/500&lt;/SPAN&gt;&lt;SPAN class="s1"&gt; progress updates. That avoids hammering anything when the run is idle.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;~Mohan Mathews, Lead DSA.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 May 2026 21:02:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/157209#M54516</guid>
      <dc:creator>MoJaMa</dc:creator>
      <dc:date>2026-05-18T21:02:04Z</dc:date>
    </item>
    <item>
      <title>Re: Job tasks monitoring</title>
      <link>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/157542#M54578</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/425"&gt;@MoJaMa&lt;/a&gt;&amp;nbsp;thanks a lot for these suggestions!&lt;/P&gt;</description>
      <pubDate>Sat, 23 May 2026 18:42:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-tasks-monitoring/m-p/157542#M54578</guid>
      <dc:creator>maikel</dc:creator>
      <dc:date>2026-05-23T18:42:03Z</dc:date>
    </item>
  </channel>
</rss>

