cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Passing values from a CTE (Common Table Expression) to user-defined functions (UDF) in Spark SQL

GFrost
New Contributor
Hello everyone,

 

I'm trying to pass a value from a CTE to my function (UDF). Unfortunately, it's not working.
Here is the first variant:

 

WITH fx_date_new AS (
   SELECT CASE
         WHEN '2025-01-01' > current_date()
               THEN CAST(date_format(add_months(current_date(), -1), 'yyyyMM') AS INT)
               ELSE CAST(date_format(add_months('2025-01-01', -1), 'yyyyMM') AS INT)
         END AS fxdate
)

 

select calcccy (1000,
                     'USD',
                     'EUR',
                     select * from fx_date_new.fxdate
                    );
                   
The second variant with INNER JOIN/CROSS APPLY

WITH fx_date_new AS (
   SELECT CASE
         WHEN '2025-01-01' > current_date()
               THEN CAST(date_format(add_months(current_date(), -1), 'yyyyMM') AS INT)
               ELSE CAST(date_format(add_months('2025-01-01', -1), 'yyyyMM') AS INT)
         END AS fxdate
)

SELECT
    fx_date_new.fxdate,
    UDF.*
FROM
    fx_date_new
INNER JOIN     -- or CROSS APPLY
    gold.calcccy(
                    1000,
                    'USD',
                    'EUR',
                    fx_date_new.fxdate
                    ) AS UDF;

Neither of the two variants works.
Errorr: A column, variable, or function parameter with name `fx_date_new`.`fxdate` cannot be resolved

Do you have any idea how I can solve this?
 
 
 
1 REPLY 1

ggsmith
Contributor

I think the issue is in your subquery. You shouldn't have the entire cte query in parentheses. Only  the column from your CTE. Your FROM clause is inside your udf arguments. See if you can use the example below to fix the issue.


CREATE OR REPLACE FUNCTION my_udf(input_str1 STRING, input_str2 STRING, input_int INT)
RETURNS STRING
RETURN CONCAT('Processed: ', input_str1, ', ', input_str2, ', ', CAST(input_int AS STRING));

with t1 as (
select 1 as one, 'sample' as data1, 'example1' as data2
union all
select 2, 'example', 'example2'
union all
select 3, 'test', 'example3'
)

select
my_udf(data1, data2, one) AS processed_data
from t1;

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group