cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

org.apache.spark.SparkException: Job aborted due to stage failure during Model Training

VeereshKH
New Contributor II

org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(160, 13) finished unsuccessfully.

4 REPLIES 4

Yeshwanth
Valued Contributor II
Valued Contributor II

Hi @VeereshKH , I hope you are doing well.

Could you please share the complete error message and also confirm if you are using any spark configuration on the cluster?

We have seen "spark.databricks.pyspark.enableProcessIsolation" spark configuration was causing the problem in multiple scenarios. If you are using the same property, please try to remove it and rerun the code.

Please keep us posted with the results.

Thank you and wishing you an amazing day ahead!

Lakshay
Esteemed Contributor
Esteemed Contributor

Could you share the stage details where the issue happened?

VeereshKH
New Contributor II

Stage failed because barrier task ResultTask(66, 1) finished unsuccessfully.

ExecutorLostFailure (executor 13 exited unrelated to the running tasks)

Reason: Executor decommission.org.apache.spark.rdd.RDD.collect(RDD.scala:1034) org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:260) org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) py4j.Gateway.invoke(Gateway.java:295) py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) py4j.commands.CallCommand.execute(CallCommand.java:79) py4j.GatewayConnection.run(GatewayConnection.java:251) java.lang.Thread.run(Thread.java:748)