Hello All,
My Dataframe has 1 million records and it Contain XML files as column value . I am trying to parse the XML using Xpath function . It working fine for small records count . But it failed while trying to run 1 million records.
Error Message : -pyspark.errors.exceptions.connect.SparkException: Job aborted due to stage failure: Task 5 in stage 414054.0 failed 4 times, most recent failure: Lost task 5.14 in stage 414054.0 (TID 1658725) (172.18.1.205 executor 316): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://........./__unitystorage/schemas/cb65ef1e-aed6-4a14-b92e-1bd9c830b491/tables/4580f459-ff87-49a4-9f7d-3902e67e0a91. SQLSTATE: 58030
Caused by: java.lang.RuntimeException: Error loading expression '/SSEVENT/KKGKXA/GKLO-HEADER/GKLO-KEY/GKLO-TYPE-CODE/text()
Caused by: java.util.MissingResourceException: Could not load any resource bundle by com.sun.org.apache.xerces.internal.impl.msg.XMLMessages
Is this a memory issue ? How to handle this situation