<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: What is the difference between coalesce and repartition when it comes to shuffle partitions in spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-coalesce-and-repartition-when-it/m-p/22126#M15119</link>
    <description>&lt;P&gt;Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.&amp;nbsp; Ex:- when you want to write-out a single CSV file output instead of multiple parts&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Use repartition when you want to cause a shuffle that changes the number of partitions.&amp;nbsp; A common use-case for repartition is to remove skew in file sizes or to start out with a smaller/different number of partitions than the default in Spark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 20 Jun 2021 03:51:39 GMT</pubDate>
    <dc:creator>aladda</dc:creator>
    <dc:date>2021-06-20T03:51:39Z</dc:date>
    <item>
      <title>What is the difference between coalesce and repartition when it comes to shuffle partitions in spark</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-coalesce-and-repartition-when-it/m-p/22125#M15118</link>
      <description />
      <pubDate>Sun, 20 Jun 2021 03:49:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-coalesce-and-repartition-when-it/m-p/22125#M15118</guid>
      <dc:creator>aladda</dc:creator>
      <dc:date>2021-06-20T03:49:44Z</dc:date>
    </item>
    <item>
      <title>Re: What is the difference between coalesce and repartition when it comes to shuffle partitions in spark</title>
      <link>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-coalesce-and-repartition-when-it/m-p/22126#M15119</link>
      <description>&lt;P&gt;Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.&amp;nbsp; Ex:- when you want to write-out a single CSV file output instead of multiple parts&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Use repartition when you want to cause a shuffle that changes the number of partitions.&amp;nbsp; A common use-case for repartition is to remove skew in file sizes or to start out with a smaller/different number of partitions than the default in Spark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 20 Jun 2021 03:51:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/what-is-the-difference-between-coalesce-and-repartition-when-it/m-p/22126#M15119</guid>
      <dc:creator>aladda</dc:creator>
      <dc:date>2021-06-20T03:51:39Z</dc:date>
    </item>
  </channel>
</rss>

