- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2021 08:49 PM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2021 08:51 PM
Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order. Ex:- when you want to write-out a single CSV file output instead of multiple parts
Use repartition when you want to cause a shuffle that changes the number of partitions. A common use-case for repartition is to remove skew in file sizes or to start out with a smaller/different number of partitions than the default in Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2021 08:51 PM
Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order. Ex:- when you want to write-out a single CSV file output instead of multiple parts
Use repartition when you want to cause a shuffle that changes the number of partitions. A common use-case for repartition is to remove skew in file sizes or to start out with a smaller/different number of partitions than the default in Spark
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)