<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Community edition cluster - UI shows incorrect cores in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135771#M50428</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am a community edition user which gives me cluster ( as per below image)&lt;/P&gt;&lt;P&gt;15GB of memory and 2 cores with one driver node ONLY.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_1-1761165245208.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20968i7C5200ECC8513C03/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_1-1761165245208.png" alt="Mits11_1-1761165245208.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;However,when I read a csv file of 181MB size,&lt;/P&gt;&lt;P&gt;1) it generates 8 partitiones.&lt;BR /&gt;As per default maxPartitionBytes is set to 128MB.As per my understanding there should be 2 partitions.&lt;/P&gt;&lt;P&gt;181/128 ~ 2&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_3-1761165673802.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20970iA65DDEF55AD4C715/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_3-1761165673802.png" alt="Mits11_3-1761165673802.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) In Spark UI it shows 8 cores instead of 2.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_2-1761165566839.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20969iA7268D3E55F3B3A5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_2-1761165566839.png" alt="Mits11_2-1761165566839.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;These 2 observations are my queries.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 22 Oct 2025 20:46:25 GMT</pubDate>
    <dc:creator>Mits11</dc:creator>
    <dc:date>2025-10-22T20:46:25Z</dc:date>
    <item>
      <title>Community edition cluster - UI shows incorrect cores</title>
      <link>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135771#M50428</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am a community edition user which gives me cluster ( as per below image)&lt;/P&gt;&lt;P&gt;15GB of memory and 2 cores with one driver node ONLY.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_1-1761165245208.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20968i7C5200ECC8513C03/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_1-1761165245208.png" alt="Mits11_1-1761165245208.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;However,when I read a csv file of 181MB size,&lt;/P&gt;&lt;P&gt;1) it generates 8 partitiones.&lt;BR /&gt;As per default maxPartitionBytes is set to 128MB.As per my understanding there should be 2 partitions.&lt;/P&gt;&lt;P&gt;181/128 ~ 2&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_3-1761165673802.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20970iA65DDEF55AD4C715/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_3-1761165673802.png" alt="Mits11_3-1761165673802.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) In Spark UI it shows 8 cores instead of 2.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_2-1761165566839.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20969iA7268D3E55F3B3A5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_2-1761165566839.png" alt="Mits11_2-1761165566839.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;These 2 observations are my queries.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Oct 2025 20:46:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135771#M50428</guid>
      <dc:creator>Mits11</dc:creator>
      <dc:date>2025-10-22T20:46:25Z</dc:date>
    </item>
    <item>
      <title>Re: Community edition cluster - UI shows incorrect cores</title>
      <link>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135856#M50445</link>
      <description>&lt;P class="p1"&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/161680"&gt;@Mits11&lt;/a&gt;&amp;nbsp; — just a heads-up: &lt;SPAN class="s2"&gt;&lt;STRONG&gt;Community Edition will sunset at the end of the year&lt;/STRONG&gt;&lt;/SPAN&gt; and will no longer be available after that point. The new home for users is &lt;A href="https://www.databricks.com/learn/free-edition" target="_self"&gt;&lt;SPAN class="s2"&gt;&lt;STRONG&gt;Databricks Free Edition&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/A&gt;, which is where all future resources and support are being directed.&lt;/P&gt;
&lt;P class="p1"&gt;Community Edition is still accessible for now to give everyone time to migrate their work and assets over to Free Edition. I’d recommend making that move soon so you’re fully set up before the transition.&lt;/P&gt;
&lt;P class="p1"&gt;To answer your questions directly though, here’s what’s happening in both cases and how to verify or control it.&lt;/P&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Why you saw 8 partitions for a single 181 MB CSV&lt;/H3&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;The &lt;STRONG&gt;spark.sql.files.maxPartitionBytes&lt;/STRONG&gt; setting is an upper bound (default 128 MB), not a “target” count; Spark may create more than ceil(size/maxPartitionBytes) partitions depending on its file-scan logic and other advisory settings.&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Spark also considers a suggested minimum via &lt;STRONG&gt;spark.sql.files.minPartitionNum&lt;/STRONG&gt; (defaults to the cluster’s &lt;STRONG&gt;spark.default.parallelism&lt;/STRONG&gt;). That can push the reader toward “at least” that many partitions on file-based inputs, even if a simple size/128 MB estimate would be lower.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;In practice, with a single ~181 MB CSV and defaults, you can easily see 8 input partitions because the “minimum partitions” advisory aligns with the environment’s default parallelism (more on that below). This is consistent with guidance that large files are split into partitions around ~128 MB by default, but the actual count can be higher based on min partitions and split merging.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;How to verify in your notebook:&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&lt;CODE&gt;python
spark.conf.get("spark.sql.files.maxPartitionBytes")
spark.conf.get("spark.sql.files.minPartitionNum")
sc.defaultParallelism
df = spark.read.option("header", "true").csv("/path/to/file.csv")
df.rdd.getNumPartitions
&lt;/CODE&gt;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;If you want exactly 2 partitions post-read:&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;Use &lt;CODE&gt;df.repartition(2)&lt;/CODE&gt; (forces a shuffle, evenly redistributes).&lt;/LI&gt;
&lt;LI&gt;Or &lt;CODE&gt;df.coalesce(2)&lt;/CODE&gt; (no shuffle, merges existing partitions; better near the end of a pipeline if distribution is already balanced).&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;If you want fewer partitions at read-time (not guaranteed):&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;Increase &lt;STRONG&gt;spark.sql.files.maxPartitionBytes&lt;/STRONG&gt; (e.g., to 256 MB) and/or lower &lt;STRONG&gt;spark.sql.files.minPartitionNum&lt;/STRONG&gt;; just note minPartitionNum is “suggested,” not strictly enforced.&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Why the Spark UI shows “8 cores” on Community Edition&lt;/H3&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;In local mode, Spark’s “cores” in the UI represent the number of worker threads (task slots), not the physical CPU cores of your machine or VM.&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Spark caps certain thread-related defaults with a hard limit of 8, and local[*] will use “up to all cores” but the effective concurrency often shows as 8 threads in the UI—hence the “Total Cores: 8” you’re seeing on CE even though your cluster page lists 2 cores.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;On Databricks Community Edition specifically, you’ll typically observe &lt;STRONG&gt;spark.default.parallelism = 8&lt;/STRONG&gt; in local mode, which aligns with what the Spark UI displays as available task slots, again reflecting threads/concurrency rather than the physical core count.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;What to check:&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&lt;CODE&gt;python
sc.master            # often local[*] on CE
sc.defaultParallelism  # commonly 8 on CE
spark.sparkContext.uiWebUrl
&lt;/CODE&gt;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Quick takeaways&lt;/H3&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;“8 partitions” on read is normal with the defaults (128 MB max per partition plus a suggested minimum aligned to default parallelism). If you need a specific partition count, set it explicitly with &lt;CODE&gt;repartition&lt;/CODE&gt;/&lt;CODE&gt;coalesce&lt;/CODE&gt; after reading.&lt;/LI&gt;
&lt;LI&gt;“8 cores” in the Spark UI on CE reflects Spark’s thread-based parallelism in local mode, not physical cores; the UI shows task slots/threads, and Spark may cap defaults at 8 for concurrency.&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Louis.&lt;/DIV&gt;</description>
      <pubDate>Thu, 23 Oct 2025 15:08:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135856#M50445</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-10-23T15:08:06Z</dc:date>
    </item>
    <item>
      <title>Re: Community edition cluster - UI shows incorrect cores</title>
      <link>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135861#M50447</link>
      <description>&lt;P&gt;Thank you Louis for detailed explaination.&lt;/P&gt;&lt;P&gt;Including notifying me about CE updates.&lt;/P&gt;&lt;P&gt;However, I have noticed this ( below is the screenshot)&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Mits11_0-1761234706608.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20992iF52FB8C6C1DCF58F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Mits11_0-1761234706608.png" alt="Mits11_0-1761234706608.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;spark.sql.files.minPartitionNum &lt;/STRONG&gt;does not restun any result.&lt;/P&gt;&lt;P&gt;Its wierd.&lt;/P&gt;&lt;P&gt;Am I missing anything?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Oct 2025 15:54:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/community-edition-cluster-ui-shows-incorrect-cores/m-p/135861#M50447</guid>
      <dc:creator>Mits11</dc:creator>
      <dc:date>2025-10-23T15:54:17Z</dc:date>
    </item>
  </channel>
</rss>

