cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

daschl
Contributor

Hi,

I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.

For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.spark.sql.catalyst.json.CreateJacksonParser -- (https://github.com/couchbase/couchbase-spark-connector/blob/master/src/main/scala/com/couchbase/spark/query/QueryPartitionReader.scala#L56) .. this all works fine, both in a local IDE setup or when the job is sent to a local spark distributed setup.

But when I run it in a databricks notebook, I get:

Job aborted due to stage failure.
Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser;
	at org.apache.spark.sql.CouchbaseJsonUtils$.$anonfun$createParser$1(CouchbaseJsonUtils.scala:41)
	at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$parse$1(JacksonParser.scala:490)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2952)
	at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:490)
	at com.couchbase.spark.query.QueryPartitionReader.$anonfun$rows$2(QueryPartitionReader.scala:54)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
....
	at java.lang.Thread.run(Thread.java:748)

Any idea why Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser; is not available in this environment?

Thanks,

Michael

1 ACCEPTED SOLUTION

Accepted Solutions

daschl
Contributor

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released with the next minor version (connector 3.2.0). Nonetheless I still think this is an issue in the databricks notebook and should be addressed on your side?

View solution in original post

18 REPLIES 18

daschl
Contributor

@Kaniz Fatma​  thanks for your reply. Since this question is very implementation specific and not really related to general usage, would it make sense to connect me to an engineer familiar with the environment and the internals of the datasource v2 API? Can also be via email or a different channel.

daschl
Contributor

@Kaniz Fatma​  the entire question is in the original post - if there is further clarification needed I'm happy to provide that.

AV
New Contributor III

@Kaniz Fatma​  would appreciate if you can assign someone to help us get past this hurdle.

daschl
Contributor

@Kaniz Fatma​ I think you are not quite understanding - we are currently in the process of updating the exact page you linked (we work for couchbase!) and in that process of updating to Spark 3 we ran into the issue above. So this is specific to the databricks notebook platform, since it works with a standalone spark application... what you are telling us here is to "turn it off and on again", and we'd appreciate if it's possible to get some input from actual Databricks engineers working on that environment. Thank you!

Atanu
Databricks Employee
Databricks Employee

Hello @Michael Nitschinger​  , I am not aware of your cluster config but you may consider this jar to be updated as library. and see if are still running into this issue.

Also, please look into this -

You cannot access this data source from a cluster running Databricks Runtime 7.0 or above because a Couchbase connector that supports Apache Spark 3.0 is not available.

https://docs.databricks.com/data/data-sources/couchbase.html?_ga=2.268369358.1592229266.1636717958-1...

daschl
Contributor

@Atanu Sarkar​  what do you mean by updating the jar? The Couchbase connector supports apache spark 3.0, I wrote the new connector. We are planning to update the page you linked and we ran into the issue above. I need someone to help me debug why our Spark connector works under Spark 3 but not under Databricks Notebook.

jose_gonzalez
Databricks Employee
Databricks Employee

hi @Michael Nitschinger​ ,

Are you unblocked or still facing this issue?

Yes still facing the issue as described above!

User16752239289
Databricks Employee
Databricks Employee

@Michael Nitschinger​ Could you let us know your cluster DBR versions?

If you can add spark configuration: spark.driver.extraJavaOptions verbose:class to your cluster and run your use case again, this will print out the class org.apache.spark.sql.catalyst.json.CreateJacksonParser is loading from which jar in the driver stdout logs.

With those two info, I can decompile the jar and find out the root cause of the issue.

answered below, hope that helps!

daschl
Contributor

So I've been using DBR 10.1 right now. Here are all the driver logs: https://gist.github.com/daschl/8f3e996caf003a903006fff57d6396e3

If needed (via email) I can also give you access to the JAR I'm using as well give you access to a couchbase cluster to actually test it end-to-end.

AV
New Contributor III

Hello @Xin Wang​  thank you for helping us out. Any further updates on this ?

AV
New Contributor III

Hello @Xin Wang​ , do you have everything that you had asked for previously, any ETA on this. Please let us know we are currently blocked. Appreciate a quick turnaround 🙂

User16752239289
Databricks Employee
Databricks Employee

Hello @ARUN VIJAYRAGHAVAN​  really apologize for the late response. Could you add  spark.driver.extraJavaOptions verbose:class to your local spark distributed setup? So that we will have the same logs as you posted before for Databricks cluster. I need to do a comparison between local spark distributed setup and databricks cluster.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group