Databricks

elgeo · ‎04-18-2023

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advance

create or replace function GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1) returns DATE

return

with cte as

( select (t.DATE_FIELD), row_number() over (order by t.DATE_FIELD desc) rn

from time t

where t.DATE_FIELD = (p_date)

and t.WORK_DAYS = 1

) select (cte.DATE_FIELD) as l_result from cte where rn= p_days_before;

Anonymous · ‎04-20-2023

@ELENI GEORGOUSI :

The UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error occurs because the subquery in the WHERE clause contains a reference to the main query's column "p_date". This type of subquery is called a correlated scalar subquery and it is not supported in Databricks.

One way to rewrite this query to avoid the error is to use a JOIN or a subquery to filter the rows based on the DATE_FIELD column only, and then use the ROW_NUMBER function to get the row number of the desired row:

CREATE OR REPLACE FUNCTION GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1)
RETURNS DATE
AS
$$
  SELECT DATE_FIELD
  FROM (
    SELECT DATE_FIELD, ROW_NUMBER() OVER (ORDER BY DATE_FIELD DESC) AS rn
    FROM time
    WHERE WORK_DAYS = 1 AND DATE_FIELD < vdate
  ) subquery
  WHERE rn = vdays_bef;
$$

In this rewritten query, the subquery filters the rows based on the WORK_DAYS and DATE_FIELD columns only, and then the main query selects the row with the desired row number. This approach should work in Databricks and also in Oracle.

TheofilosSt · ‎04-21-2023

Hello, we have changed the function as you mentioned but still the same issue is present.

%sql

create or replace function test_get(v_date DATE, days_before decimal(38) DEFAULT 1) returns DATE

return

select (xx) as l_result FROM

(select (t.xx), row_number() over (order by t.xx desc) as rn

from time_table t

where t.xx = (v_date)

and t.DA_WORK_DAYS = 1

) sub

where sub.rn= days_before;

analysis exception: Correlated scalar subqueries must be aggregated: Filter (cast(cast(rn#8811 as decimal(10,0)) as decimal(38,0)) = outer(p_days_before#8814))

+- SubqueryAlias sub

+- Window [xx#8846, row_number() windowspecdefinition(xx#8846 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#8811], [xx#8846 DESC NULLS LAST]

+- Project [xx#8846]

+- Filter ((xx#8846 = outer(p_date#8813)) AND (cast(DA_WORK_DAYS#8860 as int) = 1))

+- SubqueryAlias t

+- SubqueryAlias spark_catalog.time_table

+- Relation