cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alejandrofm
by Valued Contributor
  • 791 Views
  • 0 replies
  • 2 kudos

How can I know if an instance has fallen back to On-demand?

Hi, I have several clusters, some with a 45% max spot price, some more important with a higher value. Want to know what is the best way to configure this but cannot find anything (a value of how many nodes of the last run were On-demand will do the t...

  • 791 Views
  • 0 replies
  • 2 kudos
isaac_gritz
by Databricks Employee
  • 930 Views
  • 0 replies
  • 3 kudos

Optimize Azure VM / AWS EC2 / GKE Cloud Infrastructure Costs

Tips on Reducing Cloud Compute Infrastructure Costs for Azure VM, AWS EC2, and GCP GKE on DatabricksDatabricks takes advantage of the latest Azure VM / AWS EC2 / GKE VM/instance types to ensure you get the best price performance for your workloads on...

  • 930 Views
  • 0 replies
  • 3 kudos
Anonymous
by Not applicable
  • 2219 Views
  • 2 replies
  • 2 kudos

Resolved! Spot instances - Best practice

We are having difficulties running our jobs with spot instances that get re-claimed by AWS during shuffles. Do we have any documentation / best-practices around this? We went through this article but is there anything else to keep in mind?

  • 2219 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16783853906
Contributor III
  • 2 kudos

Due to the recent changes in AWS spot market place , legacy techniques like higher spot bid price (>100%) are ineffective to retain the acquired spot node and the instances can be lost in 2 minutes notice causing workloads to fail.To mitigate this, w...

  • 2 kudos
1 More Replies
User16783853906
by Contributor III
  • 2734 Views
  • 3 replies
  • 0 kudos

Resolved! Frequent spot loss of driver nodes resulting in failed jobs when using spot fleet pools

When using spot fleet pools to schedule jobs, driver and worker nodes are provisioned from the spot pools and we are noticing jobs failing with the below exception when there is a driver spot loss. Share best practices around using fleet pools with 1...

  • 2734 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16783853906
Contributor III
  • 0 kudos

In this scenario, the driver node is reclaimed by AWS. Databricks started preview of hybrid pools feature which would allow you to provision driver node from a different pool. We recommend using on-demand pool for driver node to improve reliability i...

  • 0 kudos
2 More Replies
User16826992666
by Valued Contributor
  • 2219 Views
  • 1 replies
  • 0 kudos

What happens if a spot instance worker is lost in the middle of a query?

Does the query have to be re-run from the start, or can it continue? Trying to evaluate what risk there is by using spot instances for production jobs

  • 2219 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

If a spot instance is reclaimed in the middle of a job, then spark will treat it as a lost worker. The spark engine will automatically retry the tasks from the lost worker on other available workers. So the query does not have to start over if indivi...

  • 0 kudos
Labels