This could be because of two reasons, either scalability or timeout.
For scalability - You can consider increasing the node type.
For timeout - you can set the below in the cluster spark config.
spark.executor.heartbeatInterval 300s
spark.network.timeout 320s