- 791 Views
- 0 replies
- 2 kudos
Hi, I have several clusters, some with a 45% max spot price, some more important with a higher value. Want to know what is the best way to configure this but cannot find anything (a value of how many nodes of the last run were On-demand will do the t...
- 791 Views
- 0 replies
- 2 kudos
- 930 Views
- 0 replies
- 3 kudos
Tips on Reducing Cloud Compute Infrastructure Costs for Azure VM, AWS EC2, and GCP GKE on DatabricksDatabricks takes advantage of the latest Azure VM / AWS EC2 / GKE VM/instance types to ensure you get the best price performance for your workloads on...
- 930 Views
- 0 replies
- 3 kudos
- 2219 Views
- 2 replies
- 2 kudos
We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?
- 2219 Views
- 2 replies
- 2 kudos
Latest Reply
Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.To mitigate this, w...
1 More Replies
- 2734 Views
- 3 replies
- 0 kudos
When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...
- 2734 Views
- 3 replies
- 0 kudos
Latest Reply
In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...
2 More Replies
- 2219 Views
- 1 replies
- 0 kudos
Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs
- 2219 Views
- 1 replies
- 0 kudos
Latest Reply
If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...