cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

JavaPackage object is not callable - pydeequ

Direo
Contributor

Hi!

When I run a notebook on databricks, it throws error - " 'JavaPackage' object is not callable" which points to pydeequ library:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/checks.py in __init__(self, spark_session, level, description, constraints)

91 self._jvm = spark_session._jvm

92 self.level = level

---> 93 self._java_level = self.level._get_java_object(self._jvm)

94 self._check_java_class = self._jvm.com.amzon.deequ.checks.Check

95 self.description = description

/local_disk0/.ephemeral_nfs/envs/pythonEnv-3abbb1aa-ee5b-48da-aaf2-18f273299f52/lib/python3.8/site-packages/pydeequ/checks.py in _get_java_object(self, jvm)

19 return jvm.com.amzon.deequ.checks.CheckLevel.Error()

20 if self == CheckLevel.Warning:

---> 21 return jvm.com.amzon.deequ.checks.CheckLevel.Warning()

22 raise ValueError("Invalid value for CheckLevel Enum")

Spark 3.2.0,

Scala 2.12

I believe it has something to do with my runtime version, but I dont want to downgrade it.

Please help me with this.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

@Direo Direo​ , https://github.com/awslabs/python-deequ/issues/1

You can try to install the matching versions as others have tried.

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

@Direo Direo​ , https://github.com/awslabs/python-deequ/issues/1

You can try to install the matching versions as others have tried.

Kaniz
Community Manager
Community Manager

Hi @Werner Stinckens​ , Thank you for being an amazing contributor to our Community.

JSatiro
New Contributor II

Hi. If you are struggling like I was, these were the steps I followed to make it work:
1 - Created a cluster with Runtime 10.4 LTS, which has spark version 3.2.1 (it should work with more recent runtimes, but be aware of the spark version)
2 - When creating the cluster, add through the UI the following libraries:
- Maven library source. Coordinates: com.amazon.deequ:deequ:2.0.1-spark-3.2
- PyPI library source. Package: pydeequ==1.1.1
3 - After the cluster is created, in your notebook add the spark version to the environment variable as described in PyDeeQu's documentation (os.environ["SPARK_VERSION"] = "3.2")

Then, just import pydeequ, and you're ready to go.

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.