isaac_gritz
Databricks Employee
Databricks Employee

Cluster Optimization - how to choose the right cluster for your workload

  1. Choose the optimal instance/VM type for your workloads. Here are the general recommendations:
    1. Storage-optimized instances work best for large batch jobs and ad-hoc analytics
    2. Compute-optimized for machine learning and structured streaming workloads
    3. Memory-optimized for memory-intensive workloads
    4. GPU-optimized for deep learning workloads
  2. Enable Photon (AWS | Azure | GCP) on your clusters for up to 80% TCO savings on analytics workloads. Photon is enabled by default for Databricks SQL Warehouses.
  3. Enable auto-scaling for Databricks Clusters (AWS | Azure | GCP), DLT Clusters (AWS | Azure | GCP), SQL Warehouses (AWS | Azure | GCP) to automatically add and remove nodes based on workloads.
  4. Enable the latest LTS Databricks Runtime (AWS | Azure | GCP). Databricks Runtimes correspond with the latest advancements in Spark and Databricks including the latest performance enhancements. Databricks LTS runtimes are supported for a minimum of 2 years.
  5. Tune cluster sizes based on your SLAs and cluster utilization. We recommend testing out several cluster sizes in a proof of concept to find the cluster configuration that gives you the best price performance while meeting your SLAs and expected scalability.