<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Calling a python function (def) in databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18481#M12269</link>
    <description>&lt;P&gt;Not sure if I'm missing something here, but running a task outside of a python function runs much much quicker than executing the same task inside a function. Is there something I'm missing with how spark handles functions? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1)&lt;/P&gt;&lt;P&gt; def task(x):&lt;/P&gt;&lt;P&gt; y = dostuff(x)&lt;/P&gt;&lt;P&gt; return y&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2)&lt;/P&gt;&lt;P&gt; y = dostuff(x) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 05 Dec 2022 20:53:47 GMT</pubDate>
    <dc:creator>pjp94</dc:creator>
    <dc:date>2022-12-05T20:53:47Z</dc:date>
    <item>
      <title>Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18481#M12269</link>
      <description>&lt;P&gt;Not sure if I'm missing something here, but running a task outside of a python function runs much much quicker than executing the same task inside a function. Is there something I'm missing with how spark handles functions? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1)&lt;/P&gt;&lt;P&gt; def task(x):&lt;/P&gt;&lt;P&gt; y = dostuff(x)&lt;/P&gt;&lt;P&gt; return y&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2)&lt;/P&gt;&lt;P&gt; y = dostuff(x) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 20:53:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18481#M12269</guid>
      <dc:creator>pjp94</dc:creator>
      <dc:date>2022-12-05T20:53:47Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18482#M12270</link>
      <description>&lt;P&gt;Hi @pjp​, could you provide some more information? I'm not aware of any mechanism in Spark that can have such impact, but maybe example will make it easier for community to replicate, perform some benchmarking and help you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;Bartek&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 23:32:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18482#M12270</guid>
      <dc:creator>Bartek</dc:creator>
      <dc:date>2022-12-05T23:32:25Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18483#M12271</link>
      <description>&lt;P&gt;Sure. My function queries an external database (jdbc) along with a delta table. I'm not performing any expensive computations - just filtering for the most part. When printing timestamps in the function, I notice that most of the time is being spent on the latter (delta table query/manipulations). I don't know why that is. I even cache the tables when I query. When I functionalize, the result takes 15 min and if I run outside of a function, it takes 3 min.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 23:49:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18483#M12271</guid>
      <dc:creator>pjp94</dc:creator>
      <dc:date>2022-12-05T23:49:19Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18484#M12272</link>
      <description>&lt;P&gt;UDF is more expensive in Spark&lt;/P&gt;&lt;P&gt;That could be the reason for this&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 09:22:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18484#M12272</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2022-12-06T09:22:21Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18485#M12273</link>
      <description>&lt;P&gt;yes, there is difference in performance between Python and Scala - still, @Paras Patel​&amp;nbsp;sees performance penalty using Python in both cases&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 10:04:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18485#M12273</guid>
      <dc:creator>Bartek</dc:creator>
      <dc:date>2022-12-06T10:04:31Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18486#M12274</link>
      <description>&lt;P&gt;It would be easier if you share whole your code @pjp94&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 11:06:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18486#M12274</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-12-06T11:06:48Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18487#M12275</link>
      <description>&lt;P&gt;Assuming that dostuff you mentioned is a spark sql function, you can take a look at this stack overflow thread and links in the same thread to get some idea. &lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance" target="test_blank"&gt;https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 12:40:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18487#M12275</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-06T12:40:11Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18488#M12276</link>
      <description>&lt;P&gt;If you can convert your Python udfs to sql udfs. These play nice adaptive query executions and won’t have performance penalties. &lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 22:51:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18488#M12276</guid>
      <dc:creator>huyd</dc:creator>
      <dc:date>2022-12-29T22:51:17Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18489#M12277</link>
      <description>&lt;P&gt;Seems to be you are using UDF here. UDFs in spark are expensive because spark doesn't know how to optimize the UDF. Better to avoid them unless you have no other choice.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Jan 2023 14:07:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18489#M12277</guid>
      <dc:creator>ramravi</dc:creator>
      <dc:date>2023-01-02T14:07:19Z</dc:date>
    </item>
    <item>
      <title>Re: Calling a python function (def) in databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18490#M12278</link>
      <description>&lt;P&gt;don't use python normal function use UDF in pyspark so that will be faster&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jan 2023 06:30:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calling-a-python-function-def-in-databricks/m-p/18490#M12278</guid>
      <dc:creator>sher</dc:creator>
      <dc:date>2023-01-06T06:30:06Z</dc:date>
    </item>
  </channel>
</rss>

