<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Job stuck while utilizing all workers in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66412#M33105</link>
    <description>&lt;P&gt;As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.&lt;BR /&gt;The data is read when you apply an action (write f.e.).&lt;BR /&gt;That being said:&amp;nbsp; I have no knowledge of a bug in Databricks on clusters getting stuck and keeping consuming DBUs.&lt;BR /&gt;I think your code might be the issue here, as you mention 'iterating over data'.&amp;nbsp; That is something that should be avoided as much as possible (it is not always possible though).&lt;/P&gt;</description>
    <pubDate>Wed, 17 Apr 2024 08:18:54 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2024-04-17T08:18:54Z</dc:date>
    <item>
      <title>Job stuck while utilizing all workers</title>
      <link>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66401#M33103</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.&lt;/P&gt;&lt;P&gt;I can find one &lt;STRONG&gt;Failed Stage&lt;/STRONG&gt;&amp;nbsp;that reads&amp;nbsp;&lt;BR /&gt;&lt;SPAN&gt;org.apache.spark.SparkException: Failed to fetch spark://10.139.64.10:35257/jars/org_apache_sedona_sedona_sql_3_0_2_12_1_3_1_incubating.jar during dependency update&lt;BR /&gt;[at scala/java/spark..., threadpoolexecutor/executor...]&lt;BR /&gt;Caused by: java.io.IOException: No such file or directory&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The pattern for the entire job was to use few workers with low CPU and Memory on read, then scaling up to 14 workers with high CPU and Memory on write. However, as it got stuck on the 5th period to process, workers, CPU and Memory was consistently high, occuring hundreds of dollars of cost over the next several hours.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;While all the workers were&amp;nbsp;&lt;EM&gt;active&lt;/EM&gt;, only one of them had an active task.&lt;BR /&gt;&lt;SPAN&gt;"WAITING on java.util.concurrent.locks.ReentrantLock"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is this a Databricks issue?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 07:51:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66401#M33103</guid>
      <dc:creator>PrebenOlsen</dc:creator>
      <dc:date>2024-04-17T07:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: Job stuck while utilizing all workers</title>
      <link>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66412#M33105</link>
      <description>&lt;P&gt;As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.&lt;BR /&gt;The data is read when you apply an action (write f.e.).&lt;BR /&gt;That being said:&amp;nbsp; I have no knowledge of a bug in Databricks on clusters getting stuck and keeping consuming DBUs.&lt;BR /&gt;I think your code might be the issue here, as you mention 'iterating over data'.&amp;nbsp; That is something that should be avoided as much as possible (it is not always possible though).&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 08:18:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66412#M33105</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-04-17T08:18:54Z</dc:date>
    </item>
    <item>
      <title>Re: Job stuck while utilizing all workers</title>
      <link>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66413#M33106</link>
      <description>&lt;P&gt;Hi Werners, I agree about your explanation for read and write - but that's what the GUI&amp;nbsp;&lt;EM&gt;looks&lt;/EM&gt; like. For each iteration (spark.read.table.where(col("month") == "January") (and then February in the next iteration), it spends about 30 minutes on only 3 workers, until it finally boosts up to 14 workers for the next 60 minutes. What is it doing in those 30 minutes?&lt;/P&gt;&lt;P&gt;The code is very heavy, as it is iterating over data (within a period of time) to continuously remove rows based on some criterias. I'll make a new thread about this.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2024 08:24:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-stuck-while-utilizing-all-workers/m-p/66413#M33106</guid>
      <dc:creator>PrebenOlsen</dc:creator>
      <dc:date>2024-04-17T08:24:38Z</dc:date>
    </item>
  </channel>
</rss>

