topic Re: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed in Data Engineering

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

rchauhan — Tue, 01 Aug 2023 23:58:02 GMT

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to?

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4 times, most recent failure: Lost task 1.3 in stage 188.0 (TID 1823) (10###.#.# executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Command exited with code 50 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3376) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3308) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3299) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3299) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1428) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3588) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3526) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3514) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)

Re: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed

Tharun-Kumar — Wed, 02 Aug 2023 07:06:51 GMT

@rchauhan

This error appears when we try to read the data from SQL server using a single connection. I would suggest to use numPartitions, lowerBound and upperBound configs to parallelize your data read.

You can find a detailed documentation here - https://docs.databricks.com/en/external-data/jdbc.html#:~:text=save()%0A)-,Control%20parallelism%20for%20JDBC%20queries,-By%20default%2C%20the

Re: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed

rchauhan — Wed, 02 Aug 2023 17:58:35 GMT

Hi @Tharun-Kumar . I am already using numPartitions, lowerBound and upperBound configs to parallelize my data read. Still I see the same error.

df=spark.read.option("numPartitions", 32).option("fetchSize", "1000").option("partitionColumn", "Key").option("lowerBound", min_o).option("upperBound", max_o).jdbc(url=jdbcUrl,table=f"({query_attr}) t ",properties=connectionProperties)

Re: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed

MDV — Mon, 25 Mar 2024 15:38:00 GMT

@rchauhan did you find a solution to the problem or know what settings caused the problem ?