cancel
Showing results for 
Search instead for 
Did you mean: 

Function returns UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY

elgeo
Valued Contributor II

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advance

create or replace function GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1) returns DATE

return

  with cte as

  ( select (t.DATE_FIELD), row_number() over (order by t.DATE_FIELD desc) rn

  from time t

  where t.DATE_FIELD = (p_date)

  and t.WORK_DAYS = 1

  ) select (cte.DATE_FIELD) as l_result from cte where rn= p_days_before;

4 REPLIES 4

Anonymous
Not applicable

@ELENI GEORGOUSI​ :

The UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error occurs because the subquery in the WHERE clause contains a reference to the main query's column "p_date". This type of subquery is called a correlated scalar subquery and it is not supported in Databricks.

One way to rewrite this query to avoid the error is to use a JOIN or a subquery to filter the rows based on the DATE_FIELD column only, and then use the ROW_NUMBER function to get the row number of the desired row:

CREATE OR REPLACE FUNCTION GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1)
RETURNS DATE
AS
$$
  SELECT DATE_FIELD
  FROM (
    SELECT DATE_FIELD, ROW_NUMBER() OVER (ORDER BY DATE_FIELD DESC) AS rn
    FROM time
    WHERE WORK_DAYS = 1 AND DATE_FIELD < vdate
  ) subquery
  WHERE rn = vdays_bef;
$$

In this rewritten query, the subquery filters the rows based on the WORK_DAYS and DATE_FIELD columns only, and then the main query selects the row with the desired row number. This approach should work in Databricks and also in Oracle.

Hello, we have changed the function as you mentioned but still the same issue is present.

%sql

create or replace function test_get(v_date DATE, days_before decimal(38) DEFAULT 1) returns DATE

return

select (xx) as l_result FROM

(select (t.xx), row_number() over (order by t.xx desc) as rn

from time_table t

where t.xx = (v_date)

and t.DA_WORK_DAYS = 1

) sub

where sub.rn= days_before;

analysis exception: Correlated scalar subqueries must be aggregated: Filter (cast(cast(rn#8811 as decimal(10,0)) as decimal(38,0)) = outer(p_days_before#8814))

+- SubqueryAlias sub

+- Window [xx#8846, row_number() windowspecdefinition(xx#8846 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#8811], [xx#8846 DESC NULLS LAST]

+- Project [xx#8846]

+- Filter ((xx#8846 = outer(p_date#8813)) AND (cast(DA_WORK_DAYS#8860 as int) = 1))

+- SubqueryAlias t

+- SubqueryAlias spark_catalog.time_table

+- Relation

TheofilosSt
New Contributor II

Dear experts,

we have not received any answer on this please respond.

TheofilosSt
New Contributor II

Hello @Suteja Kanuri​  can we have any respond on the above?

Thank you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.