Spot instances - Best practice

Anonymous
Not applicable

We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?

sean_owen
Databricks Employee
Databricks Employee

What are you setting your bid price to? I think its' reasonable to set it to 100% of on-demand price, or else you may get evicted more frequently. It's also a good idea for a job like this to set only _some_ of the executors to be spot instances, so that you never lose a critical mass of executors, while saving some money otherwise.

View solution in original post

User16783853906
Databricks Employee
Databricks Employee

Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.

To mitigate this, we should encourage customers to rely on -

  1. Using multiple instance families as part of their cluster/pool creation
  2. Provision master node from an on demand pool
  3. Consider using the appropriate spot allocation strategy like CAPACITY_OPTIMIZED, LOW_PRICE etc