Databricks Community

daniil_terentye · ‎07-19-2021

Hi everybody.

Looks like EXISTS statement works incorrectly.

If i execute the following statement in SQL Server it returns one row, as it should

WITH a AS (
    SELECT '1' AS id, 'Super Company' AS name
    UNION
    SELECT '2' AS id, 'SUPER COMPANY' AS name
),
b AS (
    SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company
)
SELECT
    *
FROM
    b
WHERE
    NOT EXISTS (
        SELECT
          1
        FROM
          a
        WHERE
          LOWER(a.Name) = LOWER(b.user_company)
        GROUP BY
          LOWER(a.Name)
        HAVING
            COUNT(DISTINCT a.Id) = 1
    )

But if i execute the same statement in Databricks notebook it returns nothing, that is absolutely incorrect.

Cluster version is 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)

Could somebody please look into this.

I know how to rewrite this statements using joins and so on. But, may be, it's possible to make EXISTS statement work in a correct way?

Thanks in advance.

Regards,

Daniil.

daniil_terentye · ‎07-19-2021

I ended up with the following

Spark - Mixed case sensitivity in Spark DataFrame, Spark SQL, and/or Databricks Table - Stack Overflow

:(((((

It works, but it's extra-ugly.

Does it works better in spark 3.x?

daniil_terentye · ‎07-20-2021

No.

:(((

Behavior of Spark 3.1.1 is the same

daniil_terentye · ‎07-21-2021

In newer versions of spark it's possible to use ANTI JOIN and SEMI JOIN

It looks this way:

WITH a AS ( SELECT '1' AS id, 'Super Company' AS name UNION SELECT '2' AS id, 'SUPER COMPANY' AS name ), b AS ( SELECT 'a@b.com' AS user_username, 'Super Company' AS user_company ) SELECT * FROM b ANTI JOIN ( SELECT LOWER(a.Name) as lower_name FROM a GROUP BY LOWER(a.Name) HAVING COUNT(DISTINCT a.Id) = 1 ) r ON lower(b.user_company) = r.lower_name

Databricks Community

EXISTS statement works incorrectly

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs