Looking for Suggestions: Designed a Decision Tree to Recommend Optimal VM Types for Workloads

saicharandeepb
Contributor

Hi everyone!

I recently designed a decision tree model to help recommend the most suitable VM types for different kinds of workloads in Databricks.

saicharandeepb_0-1762515348166.png

 

Thought Process Behind the Design:
Determining the optimal virtual machine (VM) for a workload is heavily dependent on:

  • The type of operations being performed (compute-heavy, memory-intensive, or storage-heavy)
  • The size of the data being handled
  • And of course, cost considerations

Based on this flow, users can employ a hit-and-trial approach while monitoring Spark metrics to validate whether the current VM type or worker configuration is optimal.
If metrics indicate CPU, memory, or disk bottlenecks, the VM size or type can be adjusted to better suit the workload.

Moreover, if Spark metrics show that both CPU and memory utilization stay consistently below 50%, switching to general-purpose compute VMs is recommended to reduce cost and avoid over-provisioning.

I’d love feedback from the community on:

  • How can this decision tree be further evolved or refined?
  • What would be the best way to incorporate recommendations for general-purpose VMs directly into this flow?

Your insights will help make this decision tree more dynamic and practical for real-world Databricks workloads!

Thanks in advance for your thoughts and suggestion