06-28-2022 06:23 AM
06-28-2022 07:34 AM
If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.
As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).
06-28-2022 06:46 AM
06-28-2022 08:46 AM
So the data is copied on other worker nodes?
Or the data on that worker node is lost?
06-28-2022 07:34 AM
If the driver node fails your cluster will fail. If the worker node fails, Databricks will spawn a new worker node to replace the failed node and resumes the workload. Generally it is recommended to assign a on-demand instance for your driver and spot instances as worker nodes.
As for a comparison between Spark and Databricks, please visit our comparison page (https://databricks.com/spark/comparing-databricks-to-apache-spark).
06-28-2022 07:43 AM
Good one @Cedric Law Hing Ping
06-28-2022 08:53 AM
So even if worker node fails between the job. It will resume the job?
And what about the data on the worker node?
Is it lost?
06-28-2022 09:15 AM
Yes, the cluster will treat it as a lost worker and schedules the workload to a different worker. Temporary data on the worker will be lost and has to be recomputed by another worker node.
06-28-2022 09:27 AM
Alright Thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group