<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: performance issues using shared compute access mode in scala in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/89267#M37750</link>
    <description>&lt;P&gt;I can confirm this behaviour. To run the same job on shared cluster in "USER_ISOLATION" mode with nothing changes between the job definition or source data, the performance drop is significant. So much so that there needs to be a radical change in how we need to process data.&lt;/P&gt;</description>
    <pubDate>Tue, 10 Sep 2024 10:00:26 GMT</pubDate>
    <dc:creator>prakharcode</dc:creator>
    <dc:date>2024-09-10T10:00:26Z</dc:date>
    <item>
      <title>performance issues using shared compute access mode in scala</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/61377#M31774</link>
      <description>&lt;P&gt;I created on our dev environment a cluster using the shared access mode, for our devs to use (instead of separate single user clusters).&lt;/P&gt;&lt;P&gt;What I notice is that the performance of this cluster is terrible.&amp;nbsp; And I mean really terrible: notebook cells without any action, so just dataframe definitions take minutes to complete.&amp;nbsp; Even though nothing has to be computed (lazy computing in spark).&lt;/P&gt;&lt;P&gt;When I disable shared compute (so change to single user), performance is reasonable again.&lt;/P&gt;&lt;P&gt;Any ideas?&lt;BR /&gt;At the moment I am the only user using the cluster, so it can't be the cluster load.&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2024 15:16:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/61377#M31774</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-02-21T15:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: performance issues using shared compute access mode in scala</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/61463#M31800</link>
      <description>&lt;P&gt;Thanks for the answer!&lt;/P&gt;&lt;P&gt;It seems that using shared access mode adds overhead.&amp;nbsp; The nodes/driver are not stressed at all (cpu/ram/network).&lt;BR /&gt;We use UC only.&lt;BR /&gt;The clusters seems configured correctly (using the same cluster in single user mode changes performance drastically).&lt;BR /&gt;Calculating a query plan should not take more than 5 minutes imo.&lt;BR /&gt;Physically printing the query plan takes about 40 secs in single user mode, but takes over 5 minutes in shared.&lt;BR /&gt;And the only thing that has changed is the access mode.&lt;BR /&gt;So my tentative conclusion is that shared mode adds a massive overhead.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2024 13:20:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/61463#M31800</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-02-22T13:20:51Z</dc:date>
    </item>
    <item>
      <title>Re: performance issues using shared compute access mode in scala</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/89267#M37750</link>
      <description>&lt;P&gt;I can confirm this behaviour. To run the same job on shared cluster in "USER_ISOLATION" mode with nothing changes between the job definition or source data, the performance drop is significant. So much so that there needs to be a radical change in how we need to process data.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 10:00:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/89267#M37750</guid>
      <dc:creator>prakharcode</dc:creator>
      <dc:date>2024-09-10T10:00:26Z</dc:date>
    </item>
    <item>
      <title>Re: performance issues using shared compute access mode in scala</title>
      <link>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/121761#M46541</link>
      <description>&lt;P&gt;I am experiencing a &lt;STRONG&gt;huge&lt;/STRONG&gt; performance difference between shared and dedicated compute with &lt;SPAN&gt;&lt;FONT face="courier new,courier"&gt;spark.createDataFrame(pandas_df)&lt;/FONT&gt;. Same code, same data, but it completes in 6 s on dedicated cluster and takes 6+ minutes on the shared cluster. &amp;gt;60 times difference!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jun 2025 22:46:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/performance-issues-using-shared-compute-access-mode-in-scala/m-p/121761#M46541</guid>
      <dc:creator>vr</dc:creator>
      <dc:date>2025-06-13T22:46:59Z</dc:date>
    </item>
  </channel>
</rss>

