Hello -
We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries.
"datediff(unit, start,end)" is different between these two implementations (in a good way). For the purpose of migration, we'd like to stay as consistent with possible, so I was evaluating writing a custom migration-specific UDF that emulated the t-sql behavior - so people could transition over time with minimal disruption.
I'm bumping into something I have no idea how to achieve - allowing ENUMs to be exposed to SQL from a Python function.
For example -
SELECT
DATEDIFF(
DAY, -- <---- the enum unit
TIMESTAMP'1999-12-31 23:59:59',
TIMESTAMP'2000-01-01 23:59:58'
) diff_day
Note the "DAY" there is NOT a string - it's some for of ENUM exposed to Spark SQL. I'd like to achieve that similar behavior if at all possible, but I'm not sure how to go about doing it. I'm writing the function in Python and then registering the UDF ala:
def my_date_diff(unit, start_date, end_date๐
...
spark.udf.register("udf_my_date_diff", my_date_diff, IntegerType())
When I call this in SQL, though, I get an error (which I kind of expected):
%sql
select udf_my_date_diff(DAY, TIMESTAMP'1999-12-31 23:59:59', TIMESTAMP'2000-01-01 23:59:58')
[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function parameter with name `DAY` cannot be resolved. SQLSTATE: 42703
Is there a way to expose my custom ENUM as a type when calling the UDF from SQL?