Python UDF in Unity Catalog - spark.sql error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-01-2024 08:15 PM
I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them available.
So when I try to follow https://learn.microsoft.com/en-us/azure/databricks/udf/unity-catalog I create the following function:
CREATE OR REPLACE FUNCTION catalog.schema.WatermarkRead_UC(ADLSLocation STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
WatermarkValue = spark.sql(f"SELECT WatermarkValue FROM PARQUET.`{ADLSLocation}/_watermark_log`").collect()[0][0]
return WatermarkValue
$$
And then call it:
SELECT catalog.schema.WatermarkRead_UC('abfss://container@storage.dfs.core.windows.net/path')
It returns the following error message:
NameError: name 'spark' is not defined
I tried all sorts of things but I couldn't make it work. Wouldn't spark be supported out of the box? The same function works as expected when I simply define it in a separate notebook, then %run that notebook and I can easily use the function and it runs and returns a value.
I wonder if it is a current limitation or a bug or an error in my code / design? Any help would be appreciated. Thanks
P.s.: I know I can register a UDF outside Unity Catalog and that I can create a Python wheel to import from in the notebooks but I'm after a UC-based solution if that is possible. Thanks
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-01-2024 08:22 PM
I can see someone has asked a very similar question with the same error message:
https://community.databricks.com/t5/data-engineering/unable-to-use-sql-udf/td-p/61957
The OP hasn't yet provided sufficient details about his/her function so no proper response has appeared so far. I have gone through the 4 listed points to make sure I have narrowed down the root cause of the error. And I have.
See below an even more simplified function definition (to rule out the possibility if the cluster has access to the storage location) that fails with the same NameError: name 'spark' is not defined error
CREATE OR REPLACE FUNCTION dev_fusion.log.WatermarkRead_UC(ADLSLocation STRING, WatermarkAttribute STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
WatermarkValue = 'spark.sql(f"SELECT 'value'").collect()[0][0]'
return WatermarkValue
$$
And one that works:
CREATE OR REPLACE FUNCTION dev_fusion.log.WatermarkRead_UC(ADLSLocation STRING, WatermarkAttribute STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
WatermarkValue = 'Value'
return WatermarkValue
$$
The main difference being the spark.sql part.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2024 05:12 PM
I came across the same problem. inside unity catalog UDF creation, spark.sql or spark.table doesn't work.
Adding from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() into the session doesn't work as well
Don't know how to solve yet

