cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

daschl
Contributor

Hi,

I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.

For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.spark.sql.catalyst.json.CreateJacksonParser -- (https://github.com/couchbase/couchbase-spark-connector/blob/master/src/main/scala/com/couchbase/spark/query/QueryPartitionReader.scala#L56) .. this all works fine, both in a local IDE setup or when the job is sent to a local spark distributed setup.

But when I run it in a databricks notebook, I get:

Job aborted due to stage failure.
Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser;
	at org.apache.spark.sql.CouchbaseJsonUtils$.$anonfun$createParser$1(CouchbaseJsonUtils.scala:41)
	at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$parse$1(JacksonParser.scala:490)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2952)
	at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:490)
	at com.couchbase.spark.query.QueryPartitionReader.$anonfun$rows$2(QueryPartitionReader.scala:54)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
....
	at java.lang.Thread.run(Thread.java:748)

Any idea why Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser; is not available in this environment?

Thanks,

Michael

1 ACCEPTED SOLUTION

Accepted Solutions

daschl
Contributor

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released with the next minor version (connector 3.2.0). Nonetheless I still think this is an issue in the databricks notebook and should be addressed on your side?

View solution in original post

23 REPLIES 23

Kaniz
Community Manager
Community Manager

Hi @ daschl! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

daschl
Contributor

@Kaniz Fatma​  thanks for your reply. Since this question is very implementation specific and not really related to general usage, would it make sense to connect me to an engineer familiar with the environment and the internals of the datasource v2 API? Can also be via email or a different channel.

Kaniz
Community Manager
Community Manager

Hi @Michael Nitschinger​ , You can explain the entire question here itself.

It would be great if we can solve it here over the public platform as it would help many of us.

daschl
Contributor

@Kaniz Fatma​  the entire question is in the original post - if there is further clarification needed I'm happy to provide that.

AV
New Contributor III

@Kaniz Fatma​  would appreciate if you can assign someone to help us get past this hurdle.

Kaniz
Community Manager
Community Manager

Hi @ARUN VIJAYRAGHAVAN​  and @Michael Nitschinger​ ,

I hope this may help you in some way.

https://docs.databricks.com/data/data-sources/couchbase.html

daschl
Contributor

@Kaniz Fatma​ I think you are not quite understanding - we are currently in the process of updating the exact page you linked (we work for couchbase!) and in that process of updating to Spark 3 we ran into the issue above. So this is specific to the databricks notebook platform, since it works with a standalone spark application... what you are telling us here is to "turn it off and on again", and we'd appreciate if it's possible to get some input from actual Databricks engineers working on that environment. Thank you!

Kaniz
Community Manager
Community Manager

Hi @Michael Nitschinger​ , I've reached out to the concerned Databricks engineer which you've requested. He'll be reaching out to you very soon. Thanks.

Atanu
Esteemed Contributor
Esteemed Contributor

Hello @Michael Nitschinger​  , I am not aware of your cluster config but you may consider this jar to be updated as library. and see if are still running into this issue.

Also, please look into this -

You cannot access this data source from a cluster running Databricks Runtime 7.0 or above because a Couchbase connector that supports Apache Spark 3.0 is not available.

https://docs.databricks.com/data/data-sources/couchbase.html?_ga=2.268369358.1592229266.1636717958-1...

daschl
Contributor

@Atanu Sarkar​  what do you mean by updating the jar? The Couchbase connector supports apache spark 3.0, I wrote the new connector. We are planning to update the page you linked and we ran into the issue above. I need someone to help me debug why our Spark connector works under Spark 3 but not under Databricks Notebook.

hi @Michael Nitschinger​ ,

Are you unblocked or still facing this issue?

Yes still facing the issue as described above!

User16752239289
Valued Contributor
Valued Contributor

@Michael Nitschinger​ Could you let us know your cluster DBR versions?

If you can add spark configuration: spark.driver.extraJavaOptions verbose:class to your cluster and run your use case again, this will print out the class org.apache.spark.sql.catalyst.json.CreateJacksonParser is loading from which jar in the driver stdout logs.

With those two info, I can decompile the jar and find out the root cause of the issue.

answered below, hope that helps!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.