cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Any insights on having a mixed distribution of spot vs on-demand instances within worker nodes?

User16826987838
Contributor
 
1 REPLY 1

aladda
Databricks Employee
Databricks Employee

Databricks recommends launching the cluster so that the Spark driver is on an on-demand instance, which allows saving the state of the cluster even after losing spot instance nodes. If you choose to use all spot instances including the driver, any cached data or tables are deleted if you lose the driver instance due to changes in the spot market.

Another important setting is Spot fall back to On-demand. If you are running a hybrid cluster (that is, a mix of on-demand and spot instances), and if spot instance acquisition fails or you lose the spot instances, Databricks falls back to using on-demand instances and provides you with the desired capacity. Without this option you will lose the capacity supplied by the spot instances for the cluster, causing delay or failure of your workload. Databricks recommends setting the mix of on-demand and spot instances in your cluster based on the criticality of jobs, tolerance to delays and failures due to loss of instances, and cost sensitivity for each type of use case.

See this for more details - https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-features

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now