cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

EXISTS statement works incorrectly

daniil_terentye
New Contributor III

Hi everybody.

Looks like EXISTS statement works incorrectly.

If i execute the following statement in SQL Server it returns one row, as it should

WITH a AS (
    SELECT '1' AS id, 'Super Company' AS name
    UNION
    SELECT '2' AS id, 'SUPER COMPANY' AS name
),
b AS (
    SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company
)
SELECT
    *
FROM
    b
WHERE
    NOT EXISTS (
        SELECT
          1
        FROM
          a
        WHERE
          LOWER(a.Name) = LOWER(b.user_company)
        GROUP BY
          LOWER(a.Name)
        HAVING
            COUNT(DISTINCT a.Id) = 1
    )

But if i execute the same statement in Databricks notebook it returns nothing, that is absolutely incorrect.

Cluster version is 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)

Could somebody please look into this.

I know how to rewrite this statements using joins and so on. But, may be, it's possible to make EXISTS statement work in a correct way?

Thanks in advance.

Regards,

Daniil.

3 REPLIES 3

daniil_terentye
New Contributor III

I ended up with the following

Spark - Mixed case sensitivity in Spark DataFrame, Spark SQL, and/or Databricks Table - Stack Overflow

:(((((

It works, but it's extra-ugly.

Does it works better in spark 3.x?

No.

:(((

Behavior of Spark 3.1.1 is the same

daniil_terentye
New Contributor III

In newer versions of spark it's possible to use ANTI JOIN and SEMI JOIN

It looks this way:

WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY' AS name ), b AS ( SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company ) SELECT * FROM b ANTI JOIN ( SELECT LOWER(a.Name) as lower_name FROM a GROUP BY LOWER(a.Name) HAVING COUNT(DISTINCT a.Id) = 1 ) r ON lower(b.user_company) = r.lower_name