cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What will happen if a driver or worker node fails?

abd
Contributor

What will happen if a driver node will fail?

What will happen if one of the worker node fails?

Is it same in Spark and Databricks or Databricks provide additional features to overcome these situations?

1 ACCEPTED SOLUTION

Accepted Solutions

Cedric
Databricks Employee
Databricks Employee

If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.

As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).

View solution in original post

7 REPLIES 7

Hubert-Dudek
Esteemed Contributor III
  • a worker is not a problem as it is RDD, so the dataset will survive on other workers, and new workers will be automatically deployed in databricks,
  • a driver is critical as without a driver whole cluster will fail (that's why you shouldn't use spot instances for the driver, but for workers is not a problem)

So the data is copied on other worker nodes?

Or the data on that worker node is lost?

Cedric
Databricks Employee
Databricks Employee

If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.

As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).

Prabakar
Databricks Employee
Databricks Employee

Good one @Cedric Law Hing Ping​ 

So even if worker node fails between the job. It will resume the job?

And what about the data on the worker node?

Is it lost?

Cedric
Databricks Employee
Databricks Employee

Yes, the cluster will treat it as a lost worker and schedules the workload to a different worker. Temporary data on the worker will be lost and has to be recomputed by another worker node.

abd
Contributor

Alright Thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group