Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2022 11:55 PM
Cluster Optimization - how to choose the right cluster for your workload
- Choose the optimal instance/VM type for your workloads. Here are the general recommendations:
- Storage-optimized instances work best for large batch jobs and ad-hoc analytics
- Compute-optimized for machine learning and structured streaming workloads
- Memory-optimized for memory-intensive workloads
- GPU-optimized for deep learning workloads
- Enable Photon (AWS | Azure | GCP) on your clusters for up to 80% TCO savings on analytics workloads. Photon is enabled by default for Databricks SQL Warehouses.
- Enable auto-scaling for Databricks Clusters (AWS | Azure | GCP), DLT Clusters (AWS | Azure | GCP), SQL Warehouses (AWS | Azure | GCP) to automatically add and remove nodes based on workloads.
- Enable the latest LTS Databricks Runtime (AWS | Azure | GCP). Databricks Runtimes correspond with the latest advancements in Spark and Databricks including the latest performance enhancements. Databricks LTS runtimes are supported for a minimum of 2 years.
- Tune cluster sizes based on your SLAs and cluster utilization. We recommend testing out several cluster sizes in a proof of concept to find the cluster configuration that gives you the best price performance while meeting your SLAs and expected scalability.