We had a failure on a previously running fact table load (our biggest one) and it looked like an executor was failing due to a timeout error. As a test we upped the cluster size and changed the spark.executor.heartbeatinterval to 300s and the spark....