<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: UDF importing from other modules in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59462#M31416</link>
    <description>&lt;P&gt;the notebook/code you create the udf in, does that also reside in Repos?&lt;BR /&gt;AFAIK it is enough to import the module/function and register it as a UDF.&lt;/P&gt;</description>
    <pubDate>Tue, 06 Feb 2024 12:51:37 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2024-02-06T12:51:37Z</dc:date>
    <item>
      <title>UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/58988#M31308</link>
      <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;I am using a pyspark udf. The function is being imported from a repo (in the repos section) and registered as a UDF in a the notebook. I am getting a PythonException error when the transformation is run. This is comming from the databricks.sdk.runtime.__init__.py file with the import: from dbruntime import UserNamespaceInitializer. Then it's getting a ModuleNotFoundError: No module named dbruntime.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_Greenwood_0-1706798998837.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6081i8BE3FFEAFA41EFBA/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Tom_Greenwood_0-1706798998837.png" alt="Tom_Greenwood_0-1706798998837.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;This udf uses functions imported from other module in the same repo (and third party modules). I'm wondering if there are limitations on doing this?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I can get the transformation to run if I put all of the code required, including the functions that are imported, into a notebook and run it but this is undesirable as we have a lot of supporting functions and really want to go down the traditional repo route. It's worth noting that non-udf imports from the repo do work (I've added it to the sys path), and also running the transform with a small dataset does work (so I assume it's a problem with the library availability on the workers).&lt;/P&gt;&lt;P&gt;Things I have tried that don't work:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Importing dbruntime in the notebook.&lt;/LI&gt;&lt;LI&gt;Registering all the modules used with spark.sparkContext.addPyFile("filepath") ... although I'm not sure if these would appear in the same namespace for importing in the python file.&lt;/LI&gt;&lt;LI&gt;Using Runtime 13.3 and 14.3.&lt;/LI&gt;&lt;LI&gt;Registering the udf in the file with the udf decorator.&lt;/LI&gt;&lt;LI&gt;Importing dbruntime and databrick.sdk.runtime.* in the python files.&lt;/LI&gt;&lt;LI&gt;Packaging the module into a wheel and installing it on the cluster (with and without registering this wheel with spark.sparkContext.addPyFile(&amp;lt;path-to-wheel&amp;gt;).&lt;/LI&gt;&lt;LI&gt;Using the pyspark.pandas api with no udf registration (did this first as the tranformation function is written to be used in a pandas df.apply).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Any tips and advice would be much appreciated!&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;</description>
      <pubDate>Thu, 01 Feb 2024 15:16:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/58988#M31308</guid>
      <dc:creator>Tom_Greenwood</dc:creator>
      <dc:date>2024-02-01T15:16:54Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59462#M31416</link>
      <description>&lt;P&gt;the notebook/code you create the udf in, does that also reside in Repos?&lt;BR /&gt;AFAIK it is enough to import the module/function and register it as a UDF.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 12:51:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59462#M31416</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-02-06T12:51:37Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59484#M31424</link>
      <description>&lt;P&gt;Thanks for your reply. The function that forms the udf is in repos and the notebook is not. For most tests I am registering the udf in the repo, after importing the function, however I have also tested registering the udf in the file that it's written (with the udf decorator), and also running the application of the udf in a file in the repo instead of the notebook and I'm still getting the same error everywhere.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 15:53:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59484#M31424</guid>
      <dc:creator>Tom_Greenwood</dc:creator>
      <dc:date>2024-02-06T15:53:46Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59489#M31427</link>
      <description>&lt;LI-CODE lang="python"&gt;from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

@udf(returnType=StringType())
def udf_1(*cols):
    """This works"""
    def helper_function(code: str) -&amp;gt; str:
        if code == "spam":
            return "foo"
        else:
            return "bar"

    return helper_function("HNA")

@udf(returnType=StringType())
def udf_2(*cols):
    """This causes the error"""
    return _helper_function("spam")

def _helper_function(code: str) -&amp;gt; str:
    if code == "spam":
        return "foo"
    else:
        return "bar"&lt;/LI-CODE&gt;&lt;P&gt;This is a anonymised version of a test that I have created. What is strange is that the udf that fails mimics the structure of some of our module that do work (where helper functions are used).&lt;/P&gt;</description>
      <pubDate>Tue, 06 Feb 2024 16:52:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59489#M31427</guid>
      <dc:creator>Tom_Greenwood</dc:creator>
      <dc:date>2024-02-06T16:52:01Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59492#M31428</link>
      <description>&lt;P&gt;Sorry a few mistakes were in my first answer. Here is the corrected version:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;Thanks for your reply. The function that forms the udf is in repos and the udf is registered and called in a notebook which is not. For most tests I am registering the udf in the notebook, after importing the function, however I have also tested registering the udf in the file that it's written (with the udf decorator), and also running the application of the udf in a file in the repo instead of the notebook and I'm still getting the same error everywhere.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 06 Feb 2024 17:02:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59492#M31428</guid>
      <dc:creator>Tom_Greenwood</dc:creator>
      <dc:date>2024-02-06T17:02:21Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59590#M31453</link>
      <description>&lt;P&gt;@udf(returnType=IntegerType())&lt;BR /&gt;def udf_function(s):&lt;BR /&gt;return your_function(s)&lt;/P&gt;&lt;P&gt;where your_function is the imported function, so you actually create a wrapper.&lt;BR /&gt;Also do not forget to register the udf.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Feb 2024 12:53:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/59590#M31453</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2024-02-07T12:53:51Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/62957#M32140</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99162"&gt;@Tom_Greenwood&lt;/a&gt;did you ever find a solution to this? It looks like I have the same use case as you and hitting the same error.&lt;/P&gt;&lt;P&gt;I believe earlier in the year I was able to run this same code with no errors, but now the udf can't seem to import databricks imports.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Mar 2024 18:50:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/62957#M32140</guid>
      <dc:creator>DanC</dc:creator>
      <dc:date>2024-03-07T18:50:08Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/63224#M32196</link>
      <description>&lt;P&gt;No, the wrapper function I showed in the snippet was the only thing that worked but wasn't practical so I've found a work around to not use a udf at all.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Mar 2024 11:09:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/63224#M32196</guid>
      <dc:creator>Tom_Greenwood</dc:creator>
      <dc:date>2024-03-11T11:09:48Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/63666#M32311</link>
      <description>&lt;P&gt;I was getting a similar error (full traceback below), and determined that it's related to&amp;nbsp;&lt;A href="https://github.com/databricks/databricks-sdk-py/issues/360" target="_self"&gt;this issue&lt;/A&gt;. Setting the env variables&amp;nbsp;&lt;SPAN&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;DATABRICKS_HOST&lt;/FONT&gt; and&amp;nbsp;&lt;FONT face="lucida sans unicode,lucida sans"&gt;DATABRICKS_TOKEN&lt;/FONT&gt; as suggested in that Github issue resolved the problem for me (albeit it's not a great solution, but workable for now).&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 48) (10.139.64.15 executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/runtime/__init__.py", line 79, in &amp;lt;module&amp;gt;
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 442, in init_auth
    self._header_factory = self._credentials_provider(self)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/credentials_provider.py", line 626, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 104, in __init__
    self.init_auth()
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 447, in init_auth
    raise ValueError(f'{self._credentials_provider.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 193, in _read_with_length
    return self.loads(obj)
  File "/databricks/spark/python/pyspark/serializers.py", line 571, in loads
    return cloudpickle.loads(obj, encoding=encoding)
  File "/Workspace/Repos/[REDACTED]", line 7, in &amp;lt;module&amp;gt;
    from databricks.sdk.runtime import spark
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/runtime/__init__.py", line 172, in &amp;lt;module&amp;gt;
    dbutils = RemoteDbUtils()
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 109, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 1825, in main
    func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
  File "/databricks/spark/python/pyspark/worker.py", line 1598, in read_udfs
    arg_offsets, f = read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index=0)
  File "/databricks/spark/python/pyspark/worker.py", line 735, in read_single_udf
    f, return_type = read_command(pickleSer, infile)
  File "/databricks/spark/python/pyspark/worker_util.py", line 67, in read_command
    command = serializer._read_with_length(file)
  File "/databricks/spark/python/pyspark/serializers.py", line 197, in _read_with_length
    raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/runtime/__init__.py", line 79, in &amp;lt;module&amp;gt;
    from dbruntime import UserNamespaceInitializer
ModuleNotFoundError: No module named 'dbruntime'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 442, in init_auth
    self._header_factory = self._credentials_provider(self)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/credentials_provider.py", line 626, in __call__
    raise ValueError(
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 104, in __init__
    self.init_auth()
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 447, in init_auth
    raise ValueError(f'{self._credentials_provider.auth_type()} auth: {e}') from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/serializers.py", line 193, in _read_with_length
    return self.loads(obj)
  File "/databricks/spark/python/pyspark/serializers.py", line 571, in loads
    return cloudpickle.loads(obj, encoding=encoding)
  File "/Workspace/Repos/[REDACTED]", line 7, in &amp;lt;module&amp;gt;
    from databricks.sdk.runtime import spark
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/runtime/__init__.py", line 172, in &amp;lt;module&amp;gt;
    dbutils = RemoteDbUtils()
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/dbutils.py", line 194, in __init__
    self._config = Config() if not config else config
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-892f3ee3-0955-4f40-8c06-f515eed8c2df/lib/python3.10/site-packages/databricks/sdk/config.py", line 109, in __init__
    raise ValueError(message) from e
ValueError: default auth: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Mar 2024 11:04:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/63666#M32311</guid>
      <dc:creator>DennisB</dc:creator>
      <dc:date>2024-03-14T11:04:03Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/67632#M33394</link>
      <description>&lt;P&gt;I've hit the same problem and isolated it to some degree. I can reproduce it in our main repo (with the python functions &amp;amp; UDF wrappers installed as part of a package), but cannot reproduce it in a new minimal repo I made for testing. When I copy the package source in to a non-repo folder, everything works fine.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Same type of error messages of:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;...
ValueError: cannot configure default credentials, please check https://docs.databricks.com/en/dev-tools/auth.html#databricks-client-unified-authentication to configure credentials for your preferred authentication method.
...
ModuleNotFoundError: No module named 'dbruntime'
...&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I don't understand the RCA of that github issue or why setting the environment variable may help.&lt;/P&gt;&lt;P&gt;I'm working with databricks support to resolve, and will try to share answers here.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2024 00:14:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/67632#M33394</guid>
      <dc:creator>JosiahJohnston</dc:creator>
      <dc:date>2024-04-30T00:14:37Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/80988#M36187</link>
      <description>&lt;P&gt;Did databricks support manage to help? I'm having the same issue so would be very grateful if you could share any solutions/tips they gave you&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jul 2024 15:10:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/80988#M36187</guid>
      <dc:creator>josh_redmond</dc:creator>
      <dc:date>2024-07-29T15:10:45Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/91041#M38069</link>
      <description>&lt;P&gt;I faced this issue when i was running data ingestion on unity catalog table where the cluster access mode was shared.&lt;BR /&gt;&lt;BR /&gt;i changed it to `Single user` and re-ran it again, now it is working.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AbdulMannan_0-1726740708842.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/11343iCD3FC822512B25D6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AbdulMannan_0-1726740708842.png" alt="AbdulMannan_0-1726740708842.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2024 10:12:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/91041#M38069</guid>
      <dc:creator>Abdul-Mannan</dc:creator>
      <dc:date>2024-09-19T10:12:22Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/91095#M38081</link>
      <description>&lt;P&gt;We eventually got it fixed, but I forgot to post right away. I don't remember if databricks support helped resolved, or if we figured it out on our own.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Root cause was one random (unimported) module in our library was to use dbutils to load a secret into a global variable (credentials for external S3 bucket). Leftovers from pasting code from a notebook to python module. When we refactored to remove the offending lines in the library, all of the important modules started working again for UDFs.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Sep 2024 16:56:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/91095#M38081</guid>
      <dc:creator>JosiahJohnston</dc:creator>
      <dc:date>2024-09-19T16:56:35Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/111511#M43918</link>
      <description>&lt;P&gt;Did your ever make any additional progress on this ?&lt;/P&gt;&lt;P&gt;I'm hitting a similar issue attempting to reuse functions across udf's when used within a DLT.&amp;nbsp; Work's fine outside of the DLT.&lt;/P&gt;&lt;P&gt;I can embed all the code into single function and use it as a udf, but that is limiting code reuse.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Mar 2025 20:33:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/111511#M43918</guid>
      <dc:creator>drollason</dc:creator>
      <dc:date>2025-03-01T20:33:40Z</dc:date>
    </item>
    <item>
      <title>Re: UDF importing from other modules</title>
      <link>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/117410#M45491</link>
      <description>&lt;P&gt;I just ran into and solved this issue. My problem was because in the python script that I loaded in as a module I defined the function that I planned to use as a udf separately from the function that I actually called in my script. I believe that because of this, the worker that is actually applying the udf function didn't have the part where I import * from my module, which would then run the&amp;nbsp;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; databricks.sdk.runtime &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt; &lt;SPAN&gt;* that databricks tells you to add to a module that you plan to import. Defining the function to be used for ApplyInPandas inside of the function where I catually call ApplyInPandas fixed it&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;TO illustrate&lt;/P&gt;&lt;P&gt;def a():&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;'''&amp;nbsp; Function to be used as a udf&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;'''&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;def b():&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;''' Function that I'm actually calling&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;'''&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; df.applyInPandas(a)&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;return&lt;/P&gt;</description>
      <pubDate>Thu, 01 May 2025 15:50:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/udf-importing-from-other-modules/m-p/117410#M45491</guid>
      <dc:creator>rich_avery</dc:creator>
      <dc:date>2025-05-01T15:50:21Z</dc:date>
    </item>
  </channel>
</rss>

