<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Python UDF in Unity Catalog - spark.sql error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/65256#M32757</link>
    <description>&lt;P&gt;I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them available.&lt;/P&gt;&lt;P&gt;So when I try to follow &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/udf/unity-catalog" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/udf/unity-catalog&lt;/A&gt;&amp;nbsp;I create the following function:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REPLACE FUNCTION catalog.schema.WatermarkRead_UC(ADLSLocation STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$

    WatermarkValue = spark.sql(f"SELECT WatermarkValue FROM PARQUET.`{ADLSLocation}/_watermark_log`").collect()[0][0]

    return WatermarkValue

$$&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then call it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT catalog.schema.WatermarkRead_UC('abfss://container@storage.dfs.core.windows.net/path')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It returns the following error message:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;NameError: name 'spark' is not defined&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried all sorts of things but I couldn't make it work. Wouldn't spark be supported out of the box? The same function works as expected when I simply define it in a separate notebook, then %run that notebook and I can easily use the function and it runs and returns a value.&lt;/P&gt;&lt;P&gt;I wonder if it is a current limitation or a bug or an error in my code / design? Any help would be appreciated. Thanks&lt;/P&gt;&lt;P&gt;P.s.: I know I can register a UDF outside Unity Catalog and that I can create a Python wheel to import from in the notebooks but I'm after a UC-based solution if that is possible. Thanks&lt;/P&gt;</description>
    <pubDate>Tue, 02 Apr 2024 03:15:02 GMT</pubDate>
    <dc:creator>MartinIsti</dc:creator>
    <dc:date>2024-04-02T03:15:02Z</dc:date>
    <item>
      <title>Python UDF in Unity Catalog - spark.sql error</title>
      <link>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/65256#M32757</link>
      <description>&lt;P&gt;I'm trying to utilise the option to create UDFs in Unity Catalog. That would be a great way to have functions available in a fairly straightforward manner without e.g. putting the function definitions in an extra notebook that I %run to make them available.&lt;/P&gt;&lt;P&gt;So when I try to follow &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/udf/unity-catalog" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/udf/unity-catalog&lt;/A&gt;&amp;nbsp;I create the following function:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REPLACE FUNCTION catalog.schema.WatermarkRead_UC(ADLSLocation STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$

    WatermarkValue = spark.sql(f"SELECT WatermarkValue FROM PARQUET.`{ADLSLocation}/_watermark_log`").collect()[0][0]

    return WatermarkValue

$$&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And then call it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT catalog.schema.WatermarkRead_UC('abfss://container@storage.dfs.core.windows.net/path')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It returns the following error message:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;NameError: name 'spark' is not defined&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried all sorts of things but I couldn't make it work. Wouldn't spark be supported out of the box? The same function works as expected when I simply define it in a separate notebook, then %run that notebook and I can easily use the function and it runs and returns a value.&lt;/P&gt;&lt;P&gt;I wonder if it is a current limitation or a bug or an error in my code / design? Any help would be appreciated. Thanks&lt;/P&gt;&lt;P&gt;P.s.: I know I can register a UDF outside Unity Catalog and that I can create a Python wheel to import from in the notebooks but I'm after a UC-based solution if that is possible. Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 02 Apr 2024 03:15:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/65256#M32757</guid>
      <dc:creator>MartinIsti</dc:creator>
      <dc:date>2024-04-02T03:15:02Z</dc:date>
    </item>
    <item>
      <title>Re: Python UDF in Unity Catalog - spark.sql error</title>
      <link>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/65257#M32758</link>
      <description>&lt;P&gt;I can see someone has asked a very similar question with the same error message:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/unable-to-use-sql-udf/td-p/61957" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/unable-to-use-sql-udf/td-p/61957&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The OP hasn't yet provided sufficient details about his/her function so no proper response has appeared so far. I have gone through the 4 listed points to make sure I have narrowed down the root cause of the error. And I have.&lt;/P&gt;&lt;P&gt;See below an even more simplified function definition (to rule out the possibility if the cluster has access to the storage location) that fails with the same&amp;nbsp;&lt;EM&gt;NameError: name 'spark' is not defined&lt;/EM&gt; error&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REPLACE FUNCTION dev_fusion.log.WatermarkRead_UC(ADLSLocation STRING, WatermarkAttribute STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$

    WatermarkValue = 'spark.sql(f"SELECT 'value'").collect()[0][0]'

    return WatermarkValue

$$&lt;/LI-CODE&gt;&lt;P&gt;And one that works:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REPLACE FUNCTION dev_fusion.log.WatermarkRead_UC(ADLSLocation STRING, WatermarkAttribute STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$

    WatermarkValue = 'Value'

    return WatermarkValue

$$&lt;/LI-CODE&gt;&lt;P&gt;The main difference being the&amp;nbsp;&lt;STRONG&gt;spark.sql&lt;/STRONG&gt; part.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Apr 2024 03:22:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/65257#M32758</guid>
      <dc:creator>MartinIsti</dc:creator>
      <dc:date>2024-04-02T03:22:32Z</dc:date>
    </item>
    <item>
      <title>Re: Python UDF in Unity Catalog - spark.sql error</title>
      <link>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/102125#M40974</link>
      <description>&lt;P&gt;I came across the same problem. inside unity catalog UDF creation, spark.sql or spark.table doesn't work.&lt;/P&gt;&lt;P&gt;Adding&amp;nbsp;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; pyspark.sql import SparkSession spark &lt;SPAN class=""&gt;=&lt;/SPAN&gt; SparkSession.builder.getOrCreate() into the session doesn't work as well&lt;/P&gt;&lt;P&gt;Don't know how to solve yet&lt;/P&gt;</description>
      <pubDate>Sat, 14 Dec 2024 01:12:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-udf-in-unity-catalog-spark-sql-error/m-p/102125#M40974</guid>
      <dc:creator>Linglin</dc:creator>
      <dc:date>2024-12-14T01:12:50Z</dc:date>
    </item>
  </channel>
</rss>

