Custom ENUM input as parameter for SQL UDF?

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hello -

We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries.

"datediff(unit, start,end)" is different between these two implementations (in a good way). For the purpose of migration, we'd like to stay as consistent with possible, so I was evaluating writing a custom migration-specific UDF that emulated the t-sql behavior - so people could transition over time with minimal disruption.
I'm bumping into something I have no idea how to achieve - allowing ENUMs to be exposed to SQL from a Python function.
For example -

SELECT
DATEDIFF(
DAY, -- <---- the enum unit
TIMESTAMP'1999-12-31 23:59:59',
TIMESTAMP'2000-01-01 23:59:58'
) diff_day

Note the "DAY" there is NOT a string - it's some for of ENUM exposed to Spark SQL. I'd like to achieve that similar behavior if at all possible, but I'm not sure how to go about doing it. I'm writing the function in Python and then registering the UDF ala:

def my_date_diff(unit, start_date, end_date😞

...

spark.udf.register("udf_my_date_diff", my_date_diff, IntegerType())

When I call this in SQL, though, I get an error (which I kind of expected):

%sql

select udf_my_date_diff(DAY, TIMESTAMP'1999-12-31 23:59:59', TIMESTAMP'2000-01-01 23:59:58')

[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function parameter with name `DAY` cannot be resolved. SQLSTATE: 42703

Is there a way to expose my custom ENUM as a type when calling the UDF from SQL?

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.