<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: org.apache.spark.SparkException: Job aborted due to stage failure: in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/70574#M3032</link>
    <description>&lt;P&gt;facing same issue since we moved from Spark 3.2.1 (databricks 10.4) to Spark 3.3.2 (databricks 12.2), how come we have seen this problem before, now we do.. is it Spark related or Databricks related (autoscaling?)&lt;/P&gt;</description>
    <pubDate>Fri, 24 May 2024 08:56:46 GMT</pubDate>
    <dc:creator>Dusan</dc:creator>
    <dc:date>2024-05-24T08:56:46Z</dc:date>
    <item>
      <title>org.apache.spark.SparkException: Job aborted due to stage failure:</title>
      <link>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/59240#M2472</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;I have around 20 million records in my DF, and want to save it in HORIZINTAL SQL DB.&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;This is error:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;org.apache.spark.SparkException: Job aborted due to stage failure: A shuffle map stage with indeterminate output was failed and retried. However, Spark cannot rollback the ResultStage 1525 to re-process the input data, and has to fail this job. Please eliminate the indeterminacy by checkpointing the RDD before repartition and try again.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;Here is my code:&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN class=""&gt;df.write.format("jdbc").options(&amp;nbsp;**DB_PROPS, **extra_options, dbtable=table, truncate=truncate).mode(mode).save()&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN class=""&gt;Any opinion what can go&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;wrong?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Feb 2024 17:59:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/59240#M2472</guid>
      <dc:creator>Manmohan_Nayak</dc:creator>
      <dc:date>2024-02-04T17:59:58Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Job aborted due to stage failure:</title>
      <link>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/59572#M2513</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99300"&gt;@Manmohan_Nayak&lt;/a&gt;&amp;nbsp;If the resolution worked for you?&lt;BR /&gt;I am facing the same error from last couple of days for the job which was working earlier&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 10:53:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/59572#M2513</guid>
      <dc:creator>aniketg</dc:creator>
      <dc:date>2024-02-07T10:53:39Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Job aborted due to stage failure:</title>
      <link>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/70574#M3032</link>
      <description>&lt;P&gt;facing same issue since we moved from Spark 3.2.1 (databricks 10.4) to Spark 3.3.2 (databricks 12.2), how come we have seen this problem before, now we do.. is it Spark related or Databricks related (autoscaling?)&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2024 08:56:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/70574#M3032</guid>
      <dc:creator>Dusan</dc:creator>
      <dc:date>2024-05-24T08:56:46Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: Job aborted due to stage failure:</title>
      <link>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/71597#M3066</link>
      <description>&lt;P&gt;If there are any failures which may lead to a stage retry, but retrying the stage translates into potentially having an inconsistent result (indeterminacy) then this exception is raised. The exception is raised in newer version where the validation is performed, likely unavailable in DBR 10.4 and older versions.&lt;/P&gt;
&lt;P&gt;To address the problem, you may as per the error message, checkpoint the DF before the indeterminacy is introduce.&lt;/P&gt;
&lt;P&gt;This can be commonly seen in scenarios where there are nodes lost, for example due to spot instance termination, or similar events, not fully sure about a scaling down event, but could also be another reason.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jun 2024 12:49:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure/m-p/71597#M3066</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-06-04T12:49:58Z</dc:date>
    </item>
  </channel>
</rss>

