function does not exist in JVM ERROR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-23-2022 03:33 AM
Hello guys,
I'm building a python package that return 1 row from DF at a time inside data bricks environment.
To improve the performance of this package i used multiprocessing library in python,
I have background process that his whole purpose is to prepare chunks of data ( filter the big spark df and convert to pandas or list using collect) and push them to multi process queue for the main process.
Inside the sub-process I'm using pypsark.sql.functions module to filter, index and shuffle the big spark df, convert to pandas and push it to queue.
When i wrote all the objects inside a notebook, run all the cells and tested my object every thing went fine.
after downloading a wheel file and the package i created from pip and ran a function from the wheel file that use my package error is thrown and i cant understand why.
From my point of view, for some reason the sub-process is running in environment where its don't know pyspark.sql.functions.
attaching error i get from cluster stderr logs:
Hope you guys have any idea on how to overcome this error.
This will help a lot.
Thanks.
** If any information is missing please let me know and i will edit the question **
- After more tries and test, I'm to run my object while downloading the package from pip, but when im sending my object to keras fit method the sub process cant find pyspark.sql.functions
- Labels:
-
Jvm
-
Python package
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2022 02:18 AM
Still didn't manage, If some one know how to fix it its will be really helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-13-2022 12:01 AM
Hi @Orianh, have you managed to resolve it ? I'm facing the same issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2022 10:05 AM
Hey @Vigneshwaran Ramanathan , Nope.
After some tries and performance issues i just gave up on this approach 😅
I'm not sure how databricks runs a notebook cells, I think the use of spark and multi processing cause this error since spark use java under the hood
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2023 05:37 PM
Using thread instead of processes solved the issue for me

