<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Task Hanging issue on DBR 15.4 in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140565#M4559</link>
    <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;I am running strucutred streaming pipeline with 5 models loaded using pyfunc.spark_udf. Lately we have been noticing very strange issue of tasks getting hanged and batch is taking very long time finishing its execution.&lt;BR /&gt;&lt;BR /&gt;CPU utilization is around 90% and Memory utilization is steady.&lt;BR /&gt;&lt;BR /&gt;&lt;U&gt;&lt;STRONG&gt;Issue:&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;U&gt;&lt;STRONG&gt;Configs:&lt;BR /&gt;DBR 15.4&lt;BR /&gt;Job Compute&lt;BR /&gt;1 driver and 4 workers&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-11-27 at 10.24.41 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21989i493ADC463CEAC4D9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2025-11-27 at 10.24.41 PM.png" alt="Screenshot 2025-11-27 at 10.24.41 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 28 Nov 2025 03:26:17 GMT</pubDate>
    <dc:creator>Dharma25</dc:creator>
    <dc:date>2025-11-28T03:26:17Z</dc:date>
    <item>
      <title>Task Hanging issue on DBR 15.4</title>
      <link>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140565#M4559</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;I am running strucutred streaming pipeline with 5 models loaded using pyfunc.spark_udf. Lately we have been noticing very strange issue of tasks getting hanged and batch is taking very long time finishing its execution.&lt;BR /&gt;&lt;BR /&gt;CPU utilization is around 90% and Memory utilization is steady.&lt;BR /&gt;&lt;BR /&gt;&lt;U&gt;&lt;STRONG&gt;Issue:&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;U&gt;&lt;STRONG&gt;Configs:&lt;BR /&gt;DBR 15.4&lt;BR /&gt;Job Compute&lt;BR /&gt;1 driver and 4 workers&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2025-11-27 at 10.24.41 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21989i493ADC463CEAC4D9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2025-11-27 at 10.24.41 PM.png" alt="Screenshot 2025-11-27 at 10.24.41 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Nov 2025 03:26:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140565#M4559</guid>
      <dc:creator>Dharma25</dc:creator>
      <dc:date>2025-11-28T03:26:17Z</dc:date>
    </item>
    <item>
      <title>Re: Task Hanging issue on DBR 15.4</title>
      <link>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140597#M4562</link>
      <description>&lt;P&gt;On DBR 15.4 the DeadlockDetector: TASK_HANGING message usually just means Spark has noticed some very long-running tasks and is checking for deadlocks. With multiple pyfunc.spark_udf models in a streaming query the tasks often appear “stuck” because the Python UDF is blocking (heavy model inference, external calls, or GIL contention) while CPU stays high and memory steady.&lt;/P&gt;&lt;P&gt;I’d suggest:&lt;BR /&gt;– checking the Structured Streaming metrics to see if the batch is still progressing,&lt;BR /&gt;– taking executor thread dumps to confirm threads are blocked inside the UDF,&lt;BR /&gt;– testing the pipeline with fewer models / simplified UDFs to isolate which one causes the hang,&lt;BR /&gt;– making sure models are loaded once per executor and not doing network/I/O per row, and, if possible, moving to vectorised / Pandas UDFs.&lt;/P&gt;&lt;P&gt;If the same code works on an older LTS runtime ( try to run on 14.3 or even an older one) but hangs on 15.4, it may be a runtime regression and worth raising with Databricks Support including the job and run IDs.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Nov 2025 11:36:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140597#M4562</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2025-11-28T11:36:09Z</dc:date>
    </item>
    <item>
      <title>Re: Task Hanging issue on DBR 15.4</title>
      <link>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140615#M4565</link>
      <description>&lt;P&gt;Thank you very much for your recommendations.&lt;/P&gt;&lt;P&gt;Additionally, I noticed that each executor typically has 32 active tasks by default. However, when looking at the test execution summary tab under the DAG for various stages, it displays 300 tasks.&lt;/P&gt;&lt;P&gt;Moreover, I found that executing `coalesce(1)` and then distributing it across all 10 models significantly improves performance, with batches running much faster.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Nov 2025 14:18:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/task-hanging-issue-on-dbr-15-4/m-p/140615#M4565</guid>
      <dc:creator>Dharma25</dc:creator>
      <dc:date>2025-11-28T14:18:05Z</dc:date>
    </item>
  </channel>
</rss>

