โ06-12-2022 02:19 PM
Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:
"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.6 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 161734 ms"
Could you help me please?
Thanks
โ07-05-2022 07:50 AM
This could be because of two reasons, either scalability or timeout.
For scalability - You can consider increasing the node type.
For timeout - you can set the below in the cluster spark config.
spark.executor.heartbeatInterval 300s
spark.network.timeout 320s
โ07-05-2022 07:50 AM
This could be because of two reasons, either scalability or timeout.
For scalability - You can consider increasing the node type.
For timeout - you can set the below in the cluster spark config.
spark.executor.heartbeatInterval 300s
spark.network.timeout 320s
โ10-17-2024 11:08 PM
I set this properties to cluster level, but issue doesn't gets resolved
I am trying to read jdbc oracle table and write in unity catalog.
when i give high number of .option("numPartitions", partitions)\ like 100 or 50 to achieve maximum parallelism, then i get this heartbeat timed out issue
Cluster conf: i have (20 cores 140 GB) 5 min machines on my cluster with auto-scaling set to 10
but when i reduce this to num partitions 25, the issue doesn't occurs and everything runs fine
data is few tables with data around this 173313859
Any reasoning for this?
โ07-07-2022 05:26 PM
Hi @Boumaza nadiaโ ,
Did you check the executor 3 logs when the cluster was active? if you get this error message again, I will highly recommend to check the executor's logs to be sure on what was the cause of the issue.
โ06-18-2024 01:52 PM
Please also review the Spark UI to see the failed Spark job and Spark stage. Please check on the GC time and data spill to memory and disk. See if there is any error in the failed task in the Spark stage view. This will confirm data skew or GC/memory issues with the executors.
Then, also add spark.task.cpus 2 to the spark config to allocate two cores to run one task.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group