<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic EXISTS statement works incorrectly in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17550#M11547</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi everybody.&lt;/P&gt;
&lt;P&gt;Looks like EXISTS statement works incorrectly.&lt;/P&gt;
&lt;P&gt;If i execute the following statement in SQL Server it returns one row, as it should&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;WITH a AS (
    SELECT '1' AS id, 'Super Company' AS name
    UNION
    SELECT '2' AS id, 'SUPER COMPANY' AS name
),
b AS (
    SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company
)
SELECT
    *
FROM
    b
WHERE
    NOT EXISTS (
        SELECT
          1
        FROM
          a
        WHERE
          LOWER(a.Name) = LOWER(b.user_company)
        GROUP BY
          LOWER(a.Name)
        HAVING
            COUNT(DISTINCT a.Id) = 1
    )
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But if i execute the same statement in Databricks notebook it returns nothing, that is absolutely incorrect.&lt;/P&gt;
&lt;P&gt;Cluster version is 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)&lt;/P&gt;
&lt;P&gt;Could somebody please look into this.&lt;/P&gt;
&lt;P&gt;I know how to rewrite this statements using joins and so on. But, may be, it's possible to make EXISTS statement work in a correct way?&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Daniil.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Jul 2021 19:52:34 GMT</pubDate>
    <dc:creator>daniil_terentye</dc:creator>
    <dc:date>2021-07-19T19:52:34Z</dc:date>
    <item>
      <title>EXISTS statement works incorrectly</title>
      <link>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17550#M11547</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi everybody.&lt;/P&gt;
&lt;P&gt;Looks like EXISTS statement works incorrectly.&lt;/P&gt;
&lt;P&gt;If i execute the following statement in SQL Server it returns one row, as it should&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;WITH a AS (
    SELECT '1' AS id, 'Super Company' AS name
    UNION
    SELECT '2' AS id, 'SUPER COMPANY' AS name
),
b AS (
    SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company
)
SELECT
    *
FROM
    b
WHERE
    NOT EXISTS (
        SELECT
          1
        FROM
          a
        WHERE
          LOWER(a.Name) = LOWER(b.user_company)
        GROUP BY
          LOWER(a.Name)
        HAVING
            COUNT(DISTINCT a.Id) = 1
    )
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But if i execute the same statement in Databricks notebook it returns nothing, that is absolutely incorrect.&lt;/P&gt;
&lt;P&gt;Cluster version is 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)&lt;/P&gt;
&lt;P&gt;Could somebody please look into this.&lt;/P&gt;
&lt;P&gt;I know how to rewrite this statements using joins and so on. But, may be, it's possible to make EXISTS statement work in a correct way?&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Daniil.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jul 2021 19:52:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17550#M11547</guid>
      <dc:creator>daniil_terentye</dc:creator>
      <dc:date>2021-07-19T19:52:34Z</dc:date>
    </item>
    <item>
      <title>Re: EXISTS statement works incorrectly</title>
      <link>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17551#M11548</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I ended up with the following&lt;/P&gt;
&lt;P&gt;Spark - Mixed case sensitivity in Spark DataFrame, Spark SQL, and/or Databricks Table - Stack Overflow&lt;/P&gt;
&lt;P&gt;:(((((&lt;/P&gt;
&lt;P&gt;It works, but it's extra-ugly.&lt;/P&gt;
&lt;P&gt;Does it works better in spark 3.x?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jul 2021 20:13:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17551#M11548</guid>
      <dc:creator>daniil_terentye</dc:creator>
      <dc:date>2021-07-19T20:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: EXISTS statement works incorrectly</title>
      <link>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17552#M11549</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;No.&lt;/P&gt;
&lt;P&gt;:(((&lt;/P&gt;
&lt;P&gt;Behavior of Spark 3.1.1 is the same&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2021 06:04:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17552#M11549</guid>
      <dc:creator>daniil_terentye</dc:creator>
      <dc:date>2021-07-21T06:04:02Z</dc:date>
    </item>
    <item>
      <title>Re: EXISTS statement works incorrectly</title>
      <link>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17553#M11550</link>
      <description>&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;In newer versions of spark it's possible to use ANTI JOIN and SEMI JOIN&lt;/P&gt; 
&lt;P&gt;It looks this way:&lt;/P&gt;WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY' AS name ), b AS ( SELECT 
'a@b.com' AS user_username, 'Super Company' AS user_company ) SELECT * FROM b ANTI JOIN ( SELECT LOWER(a.Name) as lower_name FROM a GROUP BY LOWER(a.Name) HAVING COUNT(DISTINCT a.Id) = 1 ) r ON lower(b.user_company) = r.lower_name 
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2021 13:52:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exists-statement-works-incorrectly/m-p/17553#M11550</guid>
      <dc:creator>daniil_terentye</dc:creator>
      <dc:date>2021-07-21T13:52:23Z</dc:date>
    </item>
  </channel>
</rss>

