cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Executor heartbeat timed out

nadia
New Contributor II

Hello, I'm trying to read a table that is located on Postgreqsl and contains 28 million rows. I have the following result:

"SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.139.64.6 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 161734 ms"

Could you help me please?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III
Esteemed Contributor III

This could be because of two reasons, either scalability or timeout.

For scalability - You can consider increasing the node type.

For timeout - you can set the below in the cluster spark config.

spark.executor.heartbeatInterval 300s

spark.network.timeout 320s

View solution in original post

2 REPLIES 2

Prabakar
Esteemed Contributor III
Esteemed Contributor III

This could be because of two reasons, either scalability or timeout.

For scalability - You can consider increasing the node type.

For timeout - you can set the below in the cluster spark config.

spark.executor.heartbeatInterval 300s

spark.network.timeout 320s

jose_gonzalez
Moderator
Moderator

Hi @Boumaza nadia​ ,

Did you check the executor 3 logs when the cluster was active? if you get this error message again, I will highly recommend to check the executor's logs to be sure on what was the cause of the issue.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.