Re: Tuning shuffle partitions

sajith_appukutt · ‎06-21-2021

AQE (enabled by default from 7.3 LTS + onwards) adjusts the shuffle partition number automatically at each stage of the query, based on the size of the map-side shuffle output. So as data size grows or shrinks over different stages, the task size will remain roughly the same, neither too big nor too small.

However it does not set the map-side partition number automatically today. Hence it is recommended to set initial shuffle partition number through the SQL config spark.sql.shuffle.partitions. Now Databricks has a feature to “Auto-Optimized Shuffle” ( spark.databricks.adaptive.autoOptimizeShuffle.enabled) which automates the need for setting this manually. For the vast majority of use cases, enabling this auto mode would be sufficient . However, if you want to hand tune you could set spark.sql.shuffle.partitions manually.

View solution in original post