cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

UDF not working inside transform() & lambda (SQL)

r0nald
New Contributor II

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?

%sql
create or replace function status_map(status int)
returns string
return map(
10, "STATUS_10",
20, "STATUS_20"
)[status];

with s1 as (
select array(map('v1', 10), map('v1', 20)) as arr
)
-- WORKS: select transform(arr, e -> e.v1) from s1
-- WORKS: select transform(arr, e -> status_map(10)) from s1

-- WHY THIS NO WORKS?!?!?111
select transform(arr, e -> status_map(e.v1)) from s1

The code below fails with:

Error in SQL statement: AnalysisException: Resolved attribute(s) e#134732 missing from in operator !Project [cast(lambda e#134732[v1] as int) AS status#134734].; line 8 pos 27 

Behavior is same if I use named_struct() instead of map().

3 REPLIES 3

urschnabeltier
New Contributor II

This doesn't answer the question, but apparently someone asked a similar question on StackOverflow too and also got zero replies https://stackoverflow.com/questions/75010883/error-when-passing-spark-sql-higher-order-functions-lam...

I found this thread and that question because I've also been having the same problem.

Given that apparently Python UDFs https://stackoverflow.com/questions/53761600/how-to-use-transform-higher-order-function/53762650#537... aren't supported, and SQL UDFs appear to be a newer feature than Python UDFs, probably SQL UDFs just aren't supported for use with `transform`.

I hope I'm wrong though because it's very frustrating -- SQL UDFs are basically essential for making code modular and legible, and code using transform and arrays is already difficult enough to maintain without inlining SQL UDFs. Arrays and structs and other complex data types are supposed to be a major differentiator for Spark SQL vs other SQL implementations, and Databricks vs other data warehouses, but basic features for reducing code complexity / increasing code modularity + maintainability aren't supported when working with those data types.

-werners-
Esteemed Contributor III

the transform function in sql is not the same as the scala/pyspark counterpart.  It is in fact a map().
Here is some interesting info

I agree that functions are essential for code modularity.  Hence I prefer not to use sql but scala/pyspark instead.

I agree with you regarding Scala UDFs being helpful. Given that I personally know very little Scala, and that Python UDFs are much less performant than SQL UDFs and Scala UDFs, I prefer to use SQL UDFs when possible. It takes me 20 times longer or worse to implement the same logic in Scala as it would to write it in SQL. If I was proficient in Scala I would probably use Scala UDFs more often, given that they are the best supported and most flexible.

Regarding the distinction between the Spark SQL map / higher order function "transform" and the different "transform" in PySpark / Scala, I don't think there was any confusion in the question. There wasn't for me personally at least although I can't speak for the OP of course. I wish they weren't named the same if for no other reason than making Google search results easier / quicker to process.

As far as using the Spark SQL "transform" goes (as opposed to PySpark / Scala "transform"), it seems that only Scala UDFs are supported. Again I find that surprising because I always assumed that under the hood the SQL UDFs were "inlined" or "macroed" almost directly into Scala Spark code (using Catalyst expressions). But based on the behavior that seems to have been an oversimplification.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group