<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Out of memory error when installing environment dependencies of UC Python UDF in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124284#M3592</link>
    <description>&lt;P&gt;Could you invoke an action on that resulting dataframe (e.g.,&amp;nbsp;&lt;EM&gt;_sqldf.display()&lt;/EM&gt;) to see what happens when the UDF runs for real?&lt;/P&gt;</description>
    <pubDate>Mon, 07 Jul 2025 09:45:20 GMT</pubDate>
    <dc:creator>carlosjuribe</dc:creator>
    <dc:date>2025-07-07T09:45:20Z</dc:date>
    <item>
      <title>Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124224#M3588</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I've created a small UC Python UDF to test whether it works with custom dependencies (new PP feature), and every time I'm getting OOM errors with this message:&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;[UDF_ENVIRONMENT_USER_ERROR.OUT_OF_MEMORY] Failed to install UDF dependencies for &amp;lt;catalog&amp;gt;.&amp;lt;schema&amp;gt;.&amp;lt;function&amp;gt;. Installation crashed due to running out of memory. SQLSTATE: 39000&lt;/STRONG&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;Context:&amp;nbsp;&lt;/STRONG&gt;the function loads a SpaCy language model, processes a string and returns the number of "PERSON" entities found in that text.&amp;nbsp;&lt;EM&gt;With a blank (&lt;/EM&gt;lighter&lt;EM&gt;) model, &lt;U&gt;it works fine&lt;/U&gt;, &lt;STRONG&gt;but&lt;/STRONG&gt; with the basic "en_core_web_sm" model, it OOMs&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Hypothesis:&amp;nbsp;&lt;/STRONG&gt;Looks to me the small language model the function loads is too big for it to handle, so it crashes. A solution could be to increase the memory somehow.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question:&amp;nbsp;&lt;/STRONG&gt;Is there a way to configure the memory that the underlying process uses (to increase it) so that the UDF doesn't crash due to OOM? Or, is there any way to solve this?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Minimally Working Example (MWE):&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1. Create small mock dataset&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import Row

data = [
    Row(id=1, document="John Smith was born in London in 1999"),
    Row(id=2, document="Alice Blake went to Colorado last winter"),
    Row(id=3, document="Michael Johnson visited Paris in 2018"),
    Row(id=4, document="Emma Davis moved to New York in 2005"),
    Row(id=5, document="David Brown traveled to Tokyo in 2020")
]

spark.createDataFrame(data).write.saveAsTable('simple_documents')&lt;/LI-CODE&gt;&lt;P&gt;2. Create the UC Python UDF&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.sql("""
CREATE OR REPLACE FUNCTION count_entities_of_type(document STRING, of_type STRING) RETURNS FLOAT
LANGUAGE PYTHON
PARAMETER STYLE PANDAS
HANDLER 'handler_function'
ENVIRONMENT (
  dependencies = '["spacy", "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl"]',
  environment_version = 'None'
)
AS $$
'''
of_type: a valid SpaCy NER label type (e.g., PERSON, ORG, GPE, DATE, etc.)
'''
import pandas as pd
from typing import Iterator

# expensive up-front computation:
nlp = spacy.load('en_core_web_sm')
# nlp = spacy.blank('en')

def handler_function(batches: Iterator[pd.Series]):
    def find_and_count_entities(text: str) -&amp;gt; float:
        doc = nlp(text)
        entities = [ent for ent in doc.ents
                        if ent.label_ == of_type]
        return float(len(entities))
    
    for document_series in batches:
        yield document_series.apply(find_and_count_entities)
$$
""")&lt;/LI-CODE&gt;&lt;P&gt;3. Invoke the function against the mock table (this fails)&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.sql('''
SELECT 
  *
, count_entities_of_type(document, 'PERSON') AS n_entities
FROM simple_documents
LIMIT 1
''')&lt;/LI-CODE&gt;&lt;P&gt;Any pointers to resources showing how to increase memory, or explaining whether this problem is even solvable in the first place, are greatly appreciated. Thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 06 Jul 2025 17:22:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124224#M3588</guid>
      <dc:creator>carlosjuribe</dc:creator>
      <dc:date>2025-07-06T17:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124268#M3589</link>
      <description>&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Translator&lt;DIV class=""&gt;&amp;nbsp;&lt;SPAN&gt;Hello Carlos,&amp;nbsp;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;&lt;BR /&gt;What I see from the code is you are already running the code on spark.sql which should be fine. I am creating a repo; please wait.&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 07 Jul 2025 08:57:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124268#M3589</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-07-07T08:57:03Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124272#M3590</link>
      <description>&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;Translator&lt;DIV class=""&gt;Hello, Carlos,&lt;DIV class=""&gt;Can you try with serverless once? I got the below result.&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vkzaffer_0-1751879151238.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18004i1FA00FA175B92E6E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vkzaffer_0-1751879151238.png" alt="vkzaffer_0-1751879151238.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 07 Jul 2025 09:06:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124272#M3590</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-07-07T09:06:51Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124283#M3591</link>
      <description>&lt;P&gt;Thanks for the response! In fact, I only used spark.sql because the code snippet tool of the message didn't allow me to use SQL for syntax highlighting, only Python. But that doesn't matter, the function creation works fine.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 09:43:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124283#M3591</guid>
      <dc:creator>carlosjuribe</dc:creator>
      <dc:date>2025-07-07T09:43:39Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124284#M3592</link>
      <description>&lt;P&gt;Could you invoke an action on that resulting dataframe (e.g.,&amp;nbsp;&lt;EM&gt;_sqldf.display()&lt;/EM&gt;) to see what happens when the UDF runs for real?&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 09:45:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124284#M3592</guid>
      <dc:creator>carlosjuribe</dc:creator>
      <dc:date>2025-07-07T09:45:20Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124505#M3607</link>
      <description>&lt;P&gt;Hello Carlos&lt;/P&gt;&lt;P&gt;Good day!!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vkzaffer_0-1752012998863.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18062iB38817A106DC1B79/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vkzaffer_0-1752012998863.png" alt="vkzaffer_0-1752012998863.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;I noticed that I created the repo for this, getting same error. I used serverless and even with 32GB serverless I got same error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;sorry i was busy with MSFT layoffs which affected me as well. resolving became a passion for me so doing this in my free time.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would highly recommended to cut a ticket on databricks if you are from aws not sure about procedure but if you are using Azure cut a ticket from support so that azure databricks can help you&amp;nbsp;&lt;/P&gt;&lt;P&gt;alternatively you can also raise a ticket on databricks but they will ask for case number from azure.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;Please raise the ticket using this lik&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://help.databricks.com/s/contact-us?ReqType=training" target="_blank" rel="nofollow noopener noreferrer"&gt;https://help.databricks.com/s/contact-us?ReqType=training&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;Please explain the issue clearly so that it will be easy for supoort team to help easily.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 22:17:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124505#M3607</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-07-08T22:17:14Z</dc:date>
    </item>
    <item>
      <title>Re: Out of memory error when installing environment dependencies of UC Python UDF</title>
      <link>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124506#M3608</link>
      <description>&lt;P&gt;I tried with cluster, spent some couple of hours to load some libraries but unable to do. may be someone else can help you on this.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 22:20:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/out-of-memory-error-when-installing-environment-dependencies-of/m-p/124506#M3608</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-07-08T22:20:24Z</dc:date>
    </item>
  </channel>
</rss>

