cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Any insights on having a mixed distribution of spot vs on-demand instances within worker nodes?

User16826987838
Contributor
 
1 REPLY 1

aladda
Honored Contributor II

Databricks recommends launching the cluster so that the Spark driver is on an on-demand instance, which allows saving the state of the cluster even after losing spot instance nodes. If you choose to use all spot instances including the driver, any cached data or tables are deleted if you lose the driver instance due to changes in the spot market.

Another important setting is Spot fall back to On-demand. If you are running a hybrid cluster (that is, a mix of on-demand and spot instances), and if spot instance acquisition fails or you lose the spot instances, Databricks falls back to using on-demand instances and provides you with the desired capacity. Without this option you will lose the capacity supplied by the spot instances for the cluster, causing delay or failure of your workload. Databricks recommends setting the mix of on-demand and spot instances in your cluster based on the criticality of jobs, tolerance to delays and failures due to loss of instances, and cost sensitivity for each type of use case.

See this for more details - https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-features

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group