cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

rchauhan
New Contributor II

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to? 

 

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4 times, most recent failure: Lost task 1.3 in stage 188.0 (TID 1823) (10###.#.# executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Command exited with code 50 Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3376) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3308) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3299) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3299) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1428) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1428) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3588) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3526) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3514) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)

 

3 REPLIES 3

Tharun-Kumar
Databricks Employee
Databricks Employee

@rchauhan 

This error appears when we try to read the data from SQL server using a single connection. I would suggest to use numPartitions, lowerBound and upperBound configs to parallelize your data read.

You can find a detailed documentation here - https://docs.databricks.com/en/external-data/jdbc.html#:~:text=save()%0A)-,Control%20parallelism%20f...

Hi @Tharun-Kumar . I am already using numPartitions, lowerBound and upperBound configs to parallelize my data read. Still I see the same error.

df=spark.read.option("numPartitions", 32).option("fetchSize", "1000").option("partitionColumn", "Key").option("lowerBound", min_o).option("upperBound", max_o).jdbc(url=jdbcUrl,table=f"({query_attr}) t ",properties=connectionProperties)

MDV
New Contributor II

@rchauhan did you find a solution to the problem or know what settings caused the problem ?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group