<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks job keep getting failed due to GC issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/108705#M43122</link>
    <description>&lt;P&gt;Hi Sid,&lt;BR /&gt;&lt;BR /&gt;Is this issue resolved? we are also experiencing the same just wanted to know if the above steps helped in resolving it and any additional steps you have followed to resolve your issue.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Tue, 04 Feb 2025 07:26:53 GMT</pubDate>
    <dc:creator>isyed</dc:creator>
    <dc:date>2025-02-04T07:26:53Z</dc:date>
    <item>
      <title>Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/81689#M36396</link>
      <description>&lt;P&gt;There is a job that running successful but it's for more than a month we are experiencing long run which gets failed. In the stdout log file(attached), there are numerous following messages:&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;FONT color="#FF0000"&gt;[GC (Allocation Failure) [PSYoungGen:...]&lt;/FONT&gt;&amp;nbsp; &amp;nbsp; and&amp;nbsp; &amp;nbsp;&lt;FONT color="#FF0000"&gt;[Full GC (System.gc()) [PSYoungGen:...]&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;It seems I am getting GC issues that take a longer time to run and then it fails every time. In one of the executors log within SparkUI\Executors page I see an error message (ExecLossReason.png) showing that &lt;FONT size="2" color="#FF0000"&gt;"Executor decommission: worker decommissioned because of kill request from HTTP endpoint (data migration disabled)"&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Then within Spark config parameters I added the following&lt;/P&gt;&lt;P&gt;&lt;FONT size="2" color="#0000FF"&gt;&lt;SPAN&gt;spark.databricks.dataMigration.enabled true&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;I tried to use stronger Compute/Worker/Driver type but still I get the same failure message.&lt;/P&gt;&lt;P&gt;Any thoughts? How can I resolve this issue while the pipeline job is working correctly in DEV, UAT up to PROD but in QA?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Aug 2024 21:25:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/81689#M36396</guid>
      <dc:creator>shahabm</dc:creator>
      <dc:date>2024-08-02T21:25:47Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/83863#M37049</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your advice worked pretty fine and I could get rid of&amp;nbsp;&lt;FONT color="#FF0000"&gt;[GC (Allocation Failure) [PSYoungGen:...]&lt;/FONT&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;totally and also by picking stronger driver/worker types, the issue in production went away.&lt;/P&gt;&lt;P&gt;I understood the default setting for GC was 'Parallel GC' and by configuring G1GC I can see more balanced behavior for the GC and also driver/workers are working more efficiently into some extent.&lt;/P&gt;&lt;P&gt;Thanks again,&lt;/P&gt;&lt;P&gt;Shahab&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2024 00:53:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/83863#M37049</guid>
      <dc:creator>shahabm</dc:creator>
      <dc:date>2024-08-22T00:53:01Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99720#M40071</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114119"&gt;@shahabm&lt;/a&gt;&amp;nbsp;, I'm facing exactly the same issue and increasing driver type or number of workers isn't helping too. Could you please guide me how it got resolved for you as I don't see the comment or post in which you got advice. This problem causing so much delays and escalations in delivery. Appreciate your timely guidance on it.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Sid&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2024 02:17:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99720#M40071</guid>
      <dc:creator>siddhu30</dc:creator>
      <dc:date>2024-11-22T02:17:08Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99769#M40086</link>
      <description>&lt;P&gt;Hi Sid,&lt;/P&gt;&lt;P&gt;These are the list of action items that helped me resolve the issue:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Change of GC algorithm into G1GC. It uses less resources and is more efficient.&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;Archived old ingested files that decreased the workload.&lt;/LI&gt;&lt;LI&gt;Chose a more stronger cluster with a more matching Databricks runtime version : 11.3 LTS (Spark 3.3.0, Scala 2.12)&lt;/LI&gt;&lt;LI&gt;chose a reasonable number of workers as min (1) and max (3) based on the log&lt;/LI&gt;&lt;LI&gt;tried to resolve all warning messages I could see in the log.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;By these actions, the first step of my workflow ran but I had a separate issue with the next step which was a file system issue. It could not find some of the delta tables and the locations they were put in. I could resolve the issue this way that let me complete the job.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Mounted the unmounted file system by comparing them to a working environment. I used dbutils to show me the mounted points. There was a misleading mounted location. I could remove it. The issue resolved.&lt;/LI&gt;&lt;LI&gt;I ran few performance improvements on my delta tables afterward&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2024 14:12:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99769#M40086</guid>
      <dc:creator>shahabm</dc:creator>
      <dc:date>2024-11-22T14:12:36Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99774#M40088</link>
      <description>&lt;P&gt;Thanks a lot&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114119"&gt;@shahabm&lt;/a&gt;&amp;nbsp;for your prompt response, appreciate it. I'll try to debug in this direction.&lt;/P&gt;&lt;P&gt;Thanks again!&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2024 14:56:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/99774#M40088</guid>
      <dc:creator>siddhu30</dc:creator>
      <dc:date>2024-11-22T14:56:17Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/108705#M43122</link>
      <description>&lt;P&gt;Hi Sid,&lt;BR /&gt;&lt;BR /&gt;Is this issue resolved? we are also experiencing the same just wanted to know if the above steps helped in resolving it and any additional steps you have followed to resolve your issue.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 04 Feb 2025 07:26:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/108705#M43122</guid>
      <dc:creator>isyed</dc:creator>
      <dc:date>2025-02-04T07:26:53Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks job keep getting failed due to GC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/149189#M53036</link>
      <description>&lt;P&gt;Hi isyed,&lt;/P&gt;&lt;P&gt;Apologies for the late response.&lt;/P&gt;&lt;P&gt;For our use case, we tried to change the code from pyspark dataframes to spark sql, which instead of keeping all the records into the memory, writes to the tables and then perform next loop. Ours is typical hierarchical looping done over 200 million of records. Every loop used to store in dataframe and then calculate the next hierarchy and so on, which caused the issue. After changing the logic to SQL (Insert records into temp tables every loop), the code was running faster since there's no storing data in memory as every loop, the records are written into temp tables and next loop record count is reduced.&lt;/P&gt;&lt;P&gt;Hope this helps.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 15:16:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-job-keep-getting-failed-due-to-gc-issue/m-p/149189#M53036</guid>
      <dc:creator>siddhu30</dc:creator>
      <dc:date>2026-02-24T15:16:06Z</dc:date>
    </item>
  </channel>
</rss>

