<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89696#M37886</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;,&amp;nbsp;Thank you for your advice, actually it is works. I find the correct &lt;A href="https://docs.databricks.com/en/udf/unity-catalog.html" target="_self"&gt;document&lt;/A&gt; by following your code. The code I show above is the &lt;A href="https://docs.databricks.com/en/udf/python.html" target="_self"&gt;scalar function&lt;/A&gt; version which is not what I wanted.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 13 Sep 2024 00:53:06 GMT</pubDate>
    <dc:creator>guangyi</dc:creator>
    <dc:date>2024-09-13T00:53:06Z</dc:date>
    <item>
      <title>Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89433#M37793</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Here is how I define the UDF inside the file udf_define.py:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import length, udf
from pyspark.sql.types import IntegerType
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

def strlen(s):
    return length(s)

spark.udf.register("my_strlen_fn", strlen, IntegerType())&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;And here is how I use the UDF in the length_quality_check.sql:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REFRESH MATERIALIZED VIEW length_verification(
    CONSTRAINT valid_length_count EXPECT (count &amp;gt; 1230)
)
AS
select COUNT(o_comment) as count
from live.bronze_table 
where o_comment is not null and my_strlen_fn(o_comment) &amp;gt; 1&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;And here is how integrate them together inside the DLT pipeline&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  pipelines:
    asset_bundle_workflow_demo_pipeline:
      name: asset_bundle_workflow_demo_pipeline
      libraries:
        - file:
            path: ../src/udf_define.py
        - notebook:
            path: ../src/dlt_pipeline.ipynb
        - file:
            path: ../src/length_quality_check.sql&lt;/LI-CODE&gt;&lt;P&gt;And here is the error message I got when I running the pipeline:&lt;/P&gt;&lt;P&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 95.0 failed 4 times, most recent failure: Lost task 2.3 in stage 95.0 (TID 167) (10.139.64.14 executor 0): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function my_strlen_fn(o_comment#3374) failed.&lt;BR /&gt;== Error ==&lt;BR /&gt;RuntimeError: SparkContext or SparkSession should be created first.&lt;BR /&gt;== Stacktrace ==&lt;/P&gt;&lt;P&gt;I don't get it, I create or get the exist Spark session already inside the udf definfition file&lt;/P&gt;&lt;P&gt;How to solve this problem?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 09:20:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89433#M37793</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-09-11T09:20:52Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89439#M37796</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/109070"&gt;@guangyi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;It seems that the Spark session might not be properly shared, could you try to change code responsible for obtaining spark session in a module?&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import length, udf
from pyspark.sql.types import IntegerType
from pyspark.sql import SparkSession

spark = SparkSession.getActiveSession()

def strlen(s):
    return length(s)

spark.udf.register("my_strlen_fn", strlen, IntegerType())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 09:51:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89439#M37796</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-11T09:51:56Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89463#M37806</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/109070"&gt;@guangyi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I've just tested and following approach will work. Register similar Python UDF function in UC.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;CREATE OR REPLACE FUNCTION catalog.schema.GetLength(strlen STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  return len(strlen)
$$&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Then you can refer to that function in materialized view:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE MATERIALIZED VIEW length_verification
AS
select COUNT(department) as count
from dev.default.employee
where department is not null and dev.default.GetLength(department) &amp;gt; 1;&lt;/LI-CODE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="szymon_dybczak_0-1726054233665.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11078i592B1A2C2B80D8A3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="szymon_dybczak_0-1726054233665.png" alt="szymon_dybczak_0-1726054233665.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 11:30:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89463#M37806</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-11T11:30:43Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89696#M37886</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;,&amp;nbsp;Thank you for your advice, actually it is works. I find the correct &lt;A href="https://docs.databricks.com/en/udf/unity-catalog.html" target="_self"&gt;document&lt;/A&gt; by following your code. The code I show above is the &lt;A href="https://docs.databricks.com/en/udf/python.html" target="_self"&gt;scalar function&lt;/A&gt; version which is not what I wanted.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Sep 2024 00:53:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89696#M37886</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-09-13T00:53:06Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to call UDF inside the Spark SQL: RuntimeError: SparkSession should be create</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89700#M37889</link>
      <description>&lt;P&gt;And I tried getActiveSession() it is not working&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Sep 2024 01:41:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-call-udf-inside-the-spark-sql-runtimeerror/m-p/89700#M37889</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-09-13T01:41:34Z</dc:date>
    </item>
  </channel>
</rss>

