cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting Py4J "Could not find py4j jar" error when trying to use pypmml, solution doesn't work

mattsteinpreis
New Contributor III

I'm trying to use pypmml in a DB notebook, but I'm getting the known `Error : Py4JError: Could not find py4j jar at` error. I've followed the solution here: https://kb.databricks.com/libraries/pypmml-fail-find-py4j-jar.html. However, this has not worked for me.

Details:

  • When I run `%pip install py4j==0.10.9` followed by `%sh find /databricks/ -name "py4j*jar"`, no results are found. However, if I install to the cluster via the Compute UI, then I do find the jar in the expected path.
  • I move the jar via:
dbutils.fs.cp('/databricks/python3/share/py4j/py4j0.10.9.jar', '/py4j/')

  • I create the init script, like so:
dbutils.fs.put("/<my-path>/install-py4j-jar.sh", """
 
#!/bin/bash
mkdir -p /share/py4j/ /current-release/
cp /dbfs/py4j/py4j0.10.9.jar /share/py4j/
cp /dbfs/py4j/py4j0.10.9.jar /current-release
""", True)
  • I attach init script, and restart.
  • I install pypmml and run something like:
from pypmml import Model
model = Model.load('/dbfs/<my-path>/<my-model>.pmml')

  • I've tried installing pypmml using %pip as well as in the cluster UI.

No matter what, I always get the same error: Py4JError: Could not find py4j jar at

I'm using DRV 10.4 LTS ML, though I've tried other versions to no avail.

Any ideas?

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

To avoid conflict with preinstalled version, py4j needs to be installed via %pip install py4j==0.10.9.

you can try this way to check where it is installed:

%sh
pip install py4j==0.10.9
pip show py4j

This didn't fix the problem.

When I do this, pypmml does "see" this version, as when I later install pypmml, it skips the py4j requirement install and cites the py4j location from the show command. However, I still get the same error.

I don't know how to make pypmml know where to look to find the right py4j jar.

Also, even when I install py4j in this way, the databricks environment still seems to point to a different py4j install. If I run:

import py4j
print(py4j.__version__)
print(py4j.__file__)

I get a different version and path than what was specified/returned from the install commands.

Hi @Matthew Steinpreis​,

Just a friendly follow-up. Are you still looking for help? please let us know

pawelmitrus
Contributor

I've been struggling myslef with it, but after installing pypmml for spark, I can use the other library, maybe it will work for you:

both pyspark & scala works