Databricks Community

AndySkinner · ‎09-18-2024

We had a failure on a previously running fact table load (our biggest one) and it looked like an executor was failing due to a timeout error. As a test we upped the cluster size and changed the spark.executor.heartbeatinterval to 300s and the spark.network.timeout to 600s. However the particular job still fails (reporting a "Executor heartbeat timed out after XXXXX ms). Looking further in the logs we noted an error message with the code XXKDA with not a lot of other info. Looking into that error message the dbricks website suggests a bug report is in order, although not sure on that.

Anyone got anything else that we could check? or what the XXKDA error message could mean? We`re currently trying a remedial action of reducing the amount of data (it`s a large merge on a fact table).

Ismael-K · ‎02-03-2025

The XXKDA error code is a general indicator for task scheduler issues or SPARK_JOB_CANCELLED.

Databricks Community

Troubleshooting Cluster

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐