<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Self-joins are blocked on remote tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/112032#M44079</link>
    <description>&lt;P&gt;Setting `blockSelfJoins` to false didn't work. It took over an hour to complete a self-join SQL query that originally ran in seconds. I wonder why the behaviour of the access mode was changed drastically across all our Databricks instances on Azure on 5 Mar? It was working perfectly in all the regions before 5 Mar.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 07 Mar 2025 17:18:34 GMT</pubDate>
    <dc:creator>chris_y_1e</dc:creator>
    <dc:date>2025-03-07T17:18:34Z</dc:date>
    <item>
      <title>Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/111900#M44037</link>
      <description>&lt;P&gt;&lt;SPAN&gt;In our production databricks workflow, we have been getting this error since yesterday in one of the steps:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;org.apache.spark.SparkException: Self-joins are blocked on remote tables&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;We haven't changed our workflow or made any configurations for the databricks. Is there any reason why we are getting this error?&lt;/P&gt;</description>
      <pubDate>Thu, 06 Mar 2025 09:53:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/111900#M44037</guid>
      <dc:creator>chris_y_1e</dc:creator>
      <dc:date>2025-03-06T09:53:01Z</dc:date>
    </item>
    <item>
      <title>Re: Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/112028#M44076</link>
      <description>&lt;P&gt;If you're using dedicated compute, please be aware that&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Self-joins are blocked by default when data filtering is called, but you can allow them by setting spark.databricks.remoteFiltering.blockSelfJoins to false on compute you are running these commands on.

Before you enable self-joins on a dedicated compute resource, be aware that a self-join query handled by the data filtering capability could return different snapshots of the same remote table.&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;From the documentation &lt;A href="https://docs.databricks.com/aws/en/compute/single-user-fgac#limitations" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 16:58:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/112028#M44076</guid>
      <dc:creator>cgrant</dc:creator>
      <dc:date>2025-03-07T16:58:13Z</dc:date>
    </item>
    <item>
      <title>Re: Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/112032#M44079</link>
      <description>&lt;P&gt;Setting `blockSelfJoins` to false didn't work. It took over an hour to complete a self-join SQL query that originally ran in seconds. I wonder why the behaviour of the access mode was changed drastically across all our Databricks instances on Azure on 5 Mar? It was working perfectly in all the regions before 5 Mar.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 17:18:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/112032#M44079</guid>
      <dc:creator>chris_y_1e</dc:creator>
      <dc:date>2025-03-07T17:18:34Z</dc:date>
    </item>
    <item>
      <title>Re: Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113083#M44418</link>
      <description>&lt;P&gt;I'm seeing this too, but only on my personal cluster with the following config:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Access Mode: Dedicated&lt;/LI&gt;&lt;LI&gt;Policy: Unrestricted&lt;/LI&gt;&lt;LI&gt;Runtime info:&amp;nbsp;&lt;SPAN&gt;15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;If I use the IT department's shared clusters running either 14.3LTS or 16.2, both of which are running Access Mode = Shared, then the problem goes away.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Condition for the failure is odd.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;DF1 = pull some info from VIEW "foo"&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;DF2= pull some different info from VIEW "foo"&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;DF3=DF1.unionByName(DF2)&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;this would not seem to be a "self-join" as described in the error message, other than the fact that the first 2 dataframes are different cuts of the same view.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 20:36:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113083#M44418</guid>
      <dc:creator>TomRenish</dc:creator>
      <dc:date>2025-03-19T20:36:48Z</dc:date>
    </item>
    <item>
      <title>Re: Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113084#M44419</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152239"&gt;@chris_y_1e&lt;/a&gt;&amp;nbsp; take a look at your cluster config to see if you're tripping up on the same condition I was.&amp;nbsp; See my comment, above.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 20:40:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113084#M44419</guid>
      <dc:creator>TomRenish</dc:creator>
      <dc:date>2025-03-19T20:40:12Z</dc:date>
    </item>
    <item>
      <title>Re: Self-joins are blocked on remote tables</title>
      <link>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113315#M44504</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/49102"&gt;@TomRenish&lt;/a&gt;&amp;nbsp;Yeah, we fixed it by changing it to use a shared compute. It is called "USER_ISOLATION" in the `job.yaml` file:&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;data_security_mode&lt;/SPAN&gt;: USER_ISOLATION&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 21 Mar 2025 15:28:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/self-joins-are-blocked-on-remote-tables/m-p/113315#M44504</guid>
      <dc:creator>chris_y_1e</dc:creator>
      <dc:date>2025-03-21T15:28:16Z</dc:date>
    </item>
  </channel>
</rss>

