cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What will happen if a driver or worker node fails?

abd
Contributor

What will happen if a driver node will fail?

What will happen if one of the worker node fails?

Is it same in Spark and Databricks or Databricks provide additional features to overcome these situations?

1 ACCEPTED SOLUTION

Accepted Solutions

Cedric
Valued Contributor
Valued Contributor

If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.

As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).

View solution in original post

8 REPLIES 8

Hubert-Dudek
Esteemed Contributor III
  • a worker is not a problem as it is RDD, so the dataset will survive on other workers, and new workers will be automatically deployed in databricks,
  • a driver is critical as without a driver whole cluster will fail (that's why you shouldn't use spot instances for the driver, but for workers is not a problem)

So the data is copied on other worker nodes?

Or the data on that worker node is lost?

Cedric
Valued Contributor
Valued Contributor

If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.

As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Good one @Cedric Law Hing Ping​ 

So even if worker node fails between the job. It will resume the job?

And what about the data on the worker node?

Is it lost?

Cedric
Valued Contributor
Valued Contributor

Yes, the cluster will treat it as a lost worker and schedules the workload to a different worker. Temporary data on the worker will be lost and has to be recomputed by another worker node.

abd
Contributor

Alright Thanks

Kaniz
Community Manager
Community Manager

Hi @Abdullah Durrani​, I'm glad to see that the suggestions provided here helped you. Well, in that case, would you please help us select the best answer for the community?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.