cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Function returns UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY

elgeo
Valued Contributor II

Hello experts. The below function in Databricks gives UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error. We didn't have this issue though in Oracle. Is this a limitation of Databricks? Just to note the final result returns only one row. Thank you in advance

create or replace function GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1) returns DATE

return

  with cte as

  ( select (t.DATE_FIELD), row_number() over (order by t.DATE_FIELD desc) rn

  from time t

  where t.DATE_FIELD = (p_date)

  and t.WORK_DAYS = 1

  ) select (cte.DATE_FIELD) as l_result from cte where rn= p_days_before;

4 REPLIES 4

Anonymous
Not applicable

@ELENI GEORGOUSI​ :

The UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY error occurs because the subquery in the WHERE clause contains a reference to the main query's column "p_date". This type of subquery is called a correlated scalar subquery and it is not supported in Databricks.

One way to rewrite this query to avoid the error is to use a JOIN or a subquery to filter the rows based on the DATE_FIELD column only, and then use the ROW_NUMBER function to get the row number of the desired row:

CREATE OR REPLACE FUNCTION GET_PREV_DAY(vdate DATE, vdays_bef decimal(38) DEFAULT 1)
RETURNS DATE
AS
$$
  SELECT DATE_FIELD
  FROM (
    SELECT DATE_FIELD, ROW_NUMBER() OVER (ORDER BY DATE_FIELD DESC) AS rn
    FROM time
    WHERE WORK_DAYS = 1 AND DATE_FIELD < vdate
  ) subquery
  WHERE rn = vdays_bef;
$$

In this rewritten query, the subquery filters the rows based on the WORK_DAYS and DATE_FIELD columns only, and then the main query selects the row with the desired row number. This approach should work in Databricks and also in Oracle.

Hello, we have changed the function as you mentioned but still the same issue is present.

%sql

create or replace function test_get(v_date DATE, days_before decimal(38) DEFAULT 1) returns DATE

return

select (xx) as l_result FROM

(select (t.xx), row_number() over (order by t.xx desc) as rn

from time_table t

where t.xx = (v_date)

and t.DA_WORK_DAYS = 1

) sub

where sub.rn= days_before;

analysis exception: Correlated scalar subqueries must be aggregated: Filter (cast(cast(rn#8811 as decimal(10,0)) as decimal(38,0)) = outer(p_days_before#8814))

+- SubqueryAlias sub

+- Window [xx#8846, row_number() windowspecdefinition(xx#8846 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#8811], [xx#8846 DESC NULLS LAST]

+- Project [xx#8846]

+- Filter ((xx#8846 = outer(p_date#8813)) AND (cast(DA_WORK_DAYS#8860 as int) = 1))

+- SubqueryAlias t

+- SubqueryAlias spark_catalog.time_table

+- Relation

TheofilosSt
New Contributor II

Dear experts,

we have not received any answer on this please respond.

TheofilosSt
New Contributor II

Hello @Suteja Kanuri​  can we have any respond on the above?

Thank you.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!