org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-01-2023 04:58 PM
When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to?
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4 times, most recent failure: Lost task 1.3 in stage 188.0 (TID 1823) (10###.#.# executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Command exited with code 50 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3376) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3308) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3299) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3299) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1428) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3588) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3526) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3514) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-02-2023 12:06 AM
This error appears when we try to read the data from SQL server using a single connection. I would suggest to use numPartitions, lowerBound and upperBound configs to parallelize your data read.
You can find a detailed documentation here - https://docs.databricks.com/en/external-data/jdbc.html#:~:text=save()%0A)-,Control%20parallelism%20f...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-02-2023 10:58 AM
Hi @Tharun-Kumar . I am already using numPartitions, lowerBound and upperBound configs to parallelize my data read. Still I see the same error.
df=spark.read.option("numPartitions", 32).option("fetchSize", "1000").option("partitionColumn", "Key").option("lowerBound", min_o).option("upperBound", max_o).jdbc(url=jdbcUrl,table=f"({query_attr}) t ",properties=connectionProperties)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2024 08:38 AM
@rchauhan did you find a solution to the problem or know what settings caused the problem ?

