cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Any insights on having a mixed distribution of spot vs on-demand instances within worker nodes?

User16826987838
Contributor
 
1 REPLY 1

Anand_Ladda
Honored Contributor II

Databricks recommends launching the cluster so that the Spark driver is on an on-demand instance, which allows saving the state of the cluster even after losing spot instance nodes. If you choose to use all spot instances including the driver, any cached data or tables are deleted if you lose the driver instance due to changes in the spot market.

Another important setting is Spot fall back to On-demand. If you are running a hybrid cluster (that is, a mix of on-demand and spot instances), and if spot instance acquisition fails or you lose the spot instances, Databricks falls back to using on-demand instances and provides you with the desired capacity. Without this option you will lose the capacity supplied by the spot instances for the cluster, causing delay or failure of your workload. Databricks recommends setting the mix of on-demand and spot instances in your cluster based on the criticality of jobs, tolerance to delays and failures due to loss of instances, and cost sensitivity for each type of use case.

See this for more details - https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-features

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.