cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting Cluster

AndySkinner
New Contributor II

 

We had a failure on a previously running fact table load (our biggest one) and it looked like an executor was failing due to a timeout error. As a test we upped the cluster size and changed the spark.executor.heartbeatinterval to 300s and the spark.network.timeout to 600s. However the particular job still fails (reporting a "Executor heartbeat timed out after XXXXX ms). Looking further in the logs we noted an error message with the code XXKDA with not a lot of other info. Looking into that error message the dbricks website suggests a bug report is in order, although not sure on that.

Anyone got anything else that we could check? or what the XXKDA error message could mean? We`re currently trying a remedial action of reducing the amount of data (it`s a large merge on a fact table).

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group