cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

JavaPackage object is not callable - pydeequ

Direo
Contributor

Hi!

When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/checks.py in __init__(self, spark_session, level, description, constraints)

91 self._jvm = spark_session._jvm

92 self.level = level

---> 93 self._java_level = self.level._get_java_object(self._jvm)

94 self._check_java_class = self._jvm.com.amzon.deequ.checks.Check

95 self.description = description

/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/checks.py in _get_java_object(self, jvm)

19 return jvm.com.amzon.deequ.checks.CheckLevel.Error()

20 if self == CheckLevel.Warning:

---> 21 return jvm.com.amzon.deequ.checks.CheckLevel.Warning()

22 raise ValueError("Invalid value for CheckLevel Enum")

Spark 3.2.0,

Scala 2.12

I believe it has something to do with my runtime version, but I dont want to downgrade it.

Please help me with this.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

@Direo Direo​ , https://github.com/awslabs/python-deequ/issues/1

You can try to install the matching versions as others have tried.

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

@Direo Direo​ , https://github.com/awslabs/python-deequ/issues/1

You can try to install the matching versions as others have tried.

JSatiro
New Contributor II

Hi. If you are struggling like I was, these were the steps I followed to make it work:
1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)
2 - When creating the cluster, add through the UI the following libraries:
- Maven library source. Coordinates: com.amazon.deequ:deequ:2.0.1-spark-3.2
- PyPI library source. Package: pydeequ==1.1.1
3 - After the cluster is created, in your notebook add the spark version to the environment variable as described in PyDeeQu's documentation (os.environ["SPARK_VERSION"] = "3.2")

Then, just import pydeequ, and you're ready to go.

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group