Along with Job aborted due to stage failure: if you see slave lost... then it is due to less memory allocated for executors, more cores per executor more memory required or the other possibility is you have used max cpu available in cluster and the demand is more