<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: function does not exist in JVM ERROR in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19934#M13440</link>
    <description>&lt;P&gt;Hi @Orianh, have you managed to resolve it ? I'm facing the same issue.&lt;/P&gt;</description>
    <pubDate>Thu, 13 Oct 2022 07:01:13 GMT</pubDate>
    <dc:creator>Vickyster</dc:creator>
    <dc:date>2022-10-13T07:01:13Z</dc:date>
    <item>
      <title>function does not exist in JVM ERROR</title>
      <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19932#M13438</link>
      <description>&lt;P&gt;Hello guys, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm building a python package that return 1 row from DF at a time inside data bricks environment.&lt;/P&gt;&lt;P&gt;To improve the performance of this package i used multiprocessing library in python, &lt;/P&gt;&lt;P&gt;I have background process that his whole purpose is to prepare chunks of data ( filter the big spark df and convert to pandas or list using collect) and push them to multi process queue for the main process.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Inside the sub-process I'm using pypsark.sql.functions module to filter, index and shuffle the big spark df, convert to pandas and push it to queue. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When i wrote all the objects inside a notebook, run all the cells and tested my object every thing went fine. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;after downloading a wheel file and the package i created from pip and ran a function from the wheel file that use my package error is thrown and i cant understand why.&lt;/P&gt;&lt;P&gt;From my point of view, for some reason the sub-process is running in environment where its don't know pyspark.sql.functions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;attaching error i get from cluster stderr logs:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="function dont exist in JVM error."&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1838i7E8B1AA61839F1CC/image-size/large?v=v2&amp;amp;px=999" role="button" title="function dont exist in JVM error." alt="function dont exist in JVM error." /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope you guys have any idea on how to overcome this error.&lt;/P&gt;&lt;P&gt;This will help a lot.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;** &lt;B&gt;If any information is missing please let me know and i will edit the question&lt;/B&gt; ** &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;After more tries and test, I'm to run my object while downloading the package from pip, but when im sending my object to keras fit method the sub process cant find pyspark.sql.functions&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 23 May 2022 10:33:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19932#M13438</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-05-23T10:33:10Z</dc:date>
    </item>
    <item>
      <title>Re: function does not exist in JVM ERROR</title>
      <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19933#M13439</link>
      <description>&lt;P&gt;Still didn't manage, If some one know how to fix it its will be really helpful.&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2022 09:18:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19933#M13439</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-05-30T09:18:14Z</dc:date>
    </item>
    <item>
      <title>Re: function does not exist in JVM ERROR</title>
      <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19934#M13440</link>
      <description>&lt;P&gt;Hi @Orianh, have you managed to resolve it ? I'm facing the same issue.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2022 07:01:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19934#M13440</guid>
      <dc:creator>Vickyster</dc:creator>
      <dc:date>2022-10-13T07:01:13Z</dc:date>
    </item>
    <item>
      <title>Re: function does not exist in JVM ERROR</title>
      <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19935#M13441</link>
      <description>&lt;P&gt;Hey @Vigneshwaran Ramanathan​&amp;nbsp;,  Nope.&lt;/P&gt;&lt;P&gt;After some tries and performance issues i  just gave up on this approach &lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt; &lt;/P&gt;&lt;P&gt;I'm not sure how databricks runs a notebook cells, I think the use of spark and multi processing cause this error since spark use java under the hood &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2022 17:05:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/19935#M13441</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-10-25T17:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: function does not exist in JVM ERROR</title>
      <link>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/35492#M25901</link>
      <description>&lt;P&gt;Using thread instead of processes solved the issue for me&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2023 00:37:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/function-does-not-exist-in-jvm-error/m-p/35492#M25901</guid>
      <dc:creator>dineshreddy</dc:creator>
      <dc:date>2023-06-28T00:37:14Z</dc:date>
    </item>
  </channel>
</rss>

