cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

rchauhan
New Contributor II

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to? 

 

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4 times, most recent failure: Lost task 1.3 in stage 188.0 (TID 1823) (10###.#.# executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Command exited with code 50 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3376) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3308) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3299) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3299) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1428) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3588) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3526) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3514) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)

 

3 REPLIES 3

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@rchauhan 

This error appears when we try to read the data from SQL server using a single connection. I would suggest to use numPartitions, lowerBound and upperBound configs to parallelize your data read.

You can find a detailed documentation here - https://docs.databricks.com/en/external-data/jdbc.html#:~:text=save()%0A)-,Control%20parallelism%20f...

Hi @Tharun-Kumar . I am already using numPartitions, lowerBound and upperBound configs to parallelize my data read. Still I see the same error.

df=spark.read.option("numPartitions", 32).option("fetchSize", "1000").option("partitionColumn", "Key").option("lowerBound", min_o).option("upperBound", max_o).jdbc(url=jdbcUrl,table=f"({query_attr}) t ",properties=connectionProperties)

MDV
New Contributor II

@rchauhan did you find a solution to the problem or know what settings caused the problem ?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.