<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/org-apache-spark-sparkexception-task-write-failed-task-failed/m-p/101257#M40598</link>
    <description>&lt;P&gt;Thank you for your question. The error is likely caused by memory issues or inefficient processing of the large dataset. Parsing XML with XPath is resource-intensive, and handling 1 million records requires optimization.&lt;/P&gt;
&lt;P&gt;You can try&amp;nbsp;df = df.repartition(100), or increasing the spark.cpu.tasks ratio from 1 to 2, or increase the executors size, this will at least give you insights on how much is it trully required and if the data is fully and evenly parallelised, to later on tune it further.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 06 Dec 2024 16:09:47 GMT</pubDate>
    <dc:creator>VZLA</dc:creator>
    <dc:date>2024-12-06T16:09:47Z</dc:date>
    <item>
      <title>org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows</title>
      <link>https://community.databricks.com/t5/data-engineering/org-apache-spark-sparkexception-task-write-failed-task-failed/m-p/101177#M40577</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;My Dataframe has 1 million records and it Contain XML files as column value . I am trying to parse the XML using Xpath function . It working fine for small records count . But it failed while trying to run&amp;nbsp;1 million records.&lt;/P&gt;&lt;P&gt;Error Message : -pyspark.errors.exceptions.connect.SparkException: Job aborted due to stage failure: Task 5 in stage 414054.0 failed 4 times, most recent failure: Lost task 5.14 in stage 414054.0 (TID 1658725) (172.18.1.205 executor 316): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://........./__unitystorage/schemas/cb65ef1e-aed6-4a14-b92e-1bd9c830b491/tables/4580f459-ff87-49a4-9f7d-3902e67e0a91. SQLSTATE: 58030&lt;/P&gt;&lt;P&gt;Caused by: java.lang.RuntimeException: Error loading expression '/SSEVENT/KKGKXA/GKLO-HEADER/GKLO-KEY/GKLO-TYPE-CODE/text()&lt;/P&gt;&lt;P&gt;Caused by: java.util.MissingResourceException: Could not load any resource bundle by com.sun.org.apache.xerces.internal.impl.msg.XMLMessages&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this a memory issue ? How to handle this situation&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 08:04:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/org-apache-spark-sparkexception-task-write-failed-task-failed/m-p/101177#M40577</guid>
      <dc:creator>satyasamal</dc:creator>
      <dc:date>2024-12-06T08:04:45Z</dc:date>
    </item>
    <item>
      <title>Re: org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows</title>
      <link>https://community.databricks.com/t5/data-engineering/org-apache-spark-sparkexception-task-write-failed-task-failed/m-p/101257#M40598</link>
      <description>&lt;P&gt;Thank you for your question. The error is likely caused by memory issues or inefficient processing of the large dataset. Parsing XML with XPath is resource-intensive, and handling 1 million records requires optimization.&lt;/P&gt;
&lt;P&gt;You can try&amp;nbsp;df = df.repartition(100), or increasing the spark.cpu.tasks ratio from 1 to 2, or increase the executors size, this will at least give you insights on how much is it trully required and if the data is fully and evenly parallelised, to later on tune it further.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 16:09:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/org-apache-spark-sparkexception-task-write-failed-task-failed/m-p/101257#M40598</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-06T16:09:47Z</dc:date>
    </item>
  </channel>
</rss>

