<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark Driver failed due to DRIVER_UNAVAILABLE but not due to memory pressure in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-driver-failed-due-to-driver-unavailable-but-not-due-to/m-p/65457#M32814</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a job cluster running streaming job and it unexpectedly failed on 19th March due to&amp;nbsp;DRIVER_UNAVAILABLE&amp;nbsp;(Request timed out, Driver is temporarily unavailable) in event log. This is the run:&amp;nbsp;&lt;A href="https://atlassian-discover.cloud.databricks.com/jobs/323849284041517/runs/395169892801478?o=4482001201517624" target="_blank"&gt;https://atlassian-discover.cloud.databricks.com/jobs/323849284041517/runs/395169892801478?o=4482001201517624&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm aware of a thread reporting the same problem:&amp;nbsp;&lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt;&amp;nbsp;and it pointed out memory pressure is a common cause. However,&amp;nbsp;according to driver stdout there were only minor GCs that took around 30ms-40ms around the time the driver became unavailable:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="duliu_0-1712192893352.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6902iF80733D7E348156E/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="duliu_0-1712192893352.png" alt="duliu_0-1712192893352.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I also checked the driver log (log4j logs) and it doesn't have any error messages, a few warning messages are unrelated. In fact the driver even continued outputting logs several minutes after the&amp;nbsp;DRIVER_UNAVAILABLE&amp;nbsp;error message appeared in event log.&lt;/P&gt;&lt;P&gt;I tried loading spark UI but after a long wait with messages saying processing files, it errors with the following message, so I can't see spark history UI as well:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="duliu_1-1712192913524.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6903i7E36ECA8F5D2D1C9/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="duliu_1-1712192913524.png" alt="duliu_1-1712192913524.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could anyone help please?&lt;/P&gt;</description>
    <pubDate>Thu, 04 Apr 2024 01:09:20 GMT</pubDate>
    <dc:creator>duliu</dc:creator>
    <dc:date>2024-04-04T01:09:20Z</dc:date>
    <item>
      <title>Spark Driver failed due to DRIVER_UNAVAILABLE but not due to memory pressure</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-driver-failed-due-to-driver-unavailable-but-not-due-to/m-p/65457#M32814</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a job cluster running streaming job and it unexpectedly failed on 19th March due to&amp;nbsp;DRIVER_UNAVAILABLE&amp;nbsp;(Request timed out, Driver is temporarily unavailable) in event log. This is the run:&amp;nbsp;&lt;A href="https://atlassian-discover.cloud.databricks.com/jobs/323849284041517/runs/395169892801478?o=4482001201517624" target="_blank"&gt;https://atlassian-discover.cloud.databricks.com/jobs/323849284041517/runs/395169892801478?o=4482001201517624&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm aware of a thread reporting the same problem:&amp;nbsp;&lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt;&amp;nbsp;and it pointed out memory pressure is a common cause. However,&amp;nbsp;according to driver stdout there were only minor GCs that took around 30ms-40ms around the time the driver became unavailable:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="duliu_0-1712192893352.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6902iF80733D7E348156E/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="duliu_0-1712192893352.png" alt="duliu_0-1712192893352.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I also checked the driver log (log4j logs) and it doesn't have any error messages, a few warning messages are unrelated. In fact the driver even continued outputting logs several minutes after the&amp;nbsp;DRIVER_UNAVAILABLE&amp;nbsp;error message appeared in event log.&lt;/P&gt;&lt;P&gt;I tried loading spark UI but after a long wait with messages saying processing files, it errors with the following message, so I can't see spark history UI as well:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="duliu_1-1712192913524.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6903i7E36ECA8F5D2D1C9/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="duliu_1-1712192913524.png" alt="duliu_1-1712192913524.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could anyone help please?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Apr 2024 01:09:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-driver-failed-due-to-driver-unavailable-but-not-due-to/m-p/65457#M32814</guid>
      <dc:creator>duliu</dc:creator>
      <dc:date>2024-04-04T01:09:20Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Driver failed due to DRIVER_UNAVAILABLE but not due to memory pressure</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-driver-failed-due-to-driver-unavailable-but-not-due-to/m-p/65548#M32838</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/60848"&gt;@duliu&lt;/a&gt;&amp;nbsp;, Hope you are doing well!&lt;/P&gt;
&lt;P&gt;Would you kindly see if the KB article below addresses your problem?&lt;/P&gt;
&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Please let me know if this helps and leave a like if this information is useful, followups are appreciated.&lt;BR /&gt;Kudos&lt;BR /&gt;Ayushi&lt;/P&gt;</description>
      <pubDate>Fri, 05 Apr 2024 05:16:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-driver-failed-due-to-driver-unavailable-but-not-due-to/m-p/65548#M32838</guid>
      <dc:creator>Ayushi_Suthar</dc:creator>
      <dc:date>2024-04-05T05:16:59Z</dc:date>
    </item>
  </channel>
</rss>

