- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-02-2024 01:42 PM - edited 07-02-2024 01:43 PM
Hi,
autoOptimizeShuffle.enabled The autoOptimizeShuffle.enabled configuration in Databricks is designed to automatically optimize the number of shuffle partitions based on the data size and the number of available executors. This can help avoid the common issue of having too many or too few partitions, which can lead to inefficiencies. Best Practices Understand Your Workload: Before relying on auto-optimization, understand the characteristics of your workload. Auto-optimization might not be the best for all types of workloads. Enable Auto Optimization: If you decide to use autoOptimizeShuffle.enabled, enable it through the Databricks configuration. This can be done as follows:
spark.conf.set("spark.databricks.optimizer.autoOptimizeShuffle.enabled", "true")
Mehdi Tajmouati
mehdi.tajmouati@wytasoft.com
06 68 23 18 42
www.wytasoft.com