cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

daschl
Contributor

Hi,

I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.

For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.spark.sql.catalyst.json.CreateJacksonParser -- (https://github.com/couchbase/couchbase-spark-connector/blob/master/src/main/scala/com/couchbase/spark/query/QueryPartitionReader.scala#L56) .. this all works fine, both in a local IDE setup or when the job is sent to a local spark distributed setup.

But when I run it in a databricks notebook, I get:

Job aborted due to stage failure.
Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser;
	at org.apache.spark.sql.CouchbaseJsonUtils$.$anonfun$createParser$1(CouchbaseJsonUtils.scala:41)
	at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$parse$1(JacksonParser.scala:490)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2952)
	at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:490)
	at com.couchbase.spark.query.QueryPartitionReader.$anonfun$rows$2(QueryPartitionReader.scala:54)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
....
	at java.lang.Thread.run(Thread.java:748)

Any idea why Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser; is not available in this environment?

Thanks,

Michael

23 REPLIES 23

daschl
Contributor

So I've been using DBR 10.1 right now. Here are all the driver logs: https://gist.github.com/daschl/8f3e996caf003a903006fff57d6396e3

If needed (via email) I can also give you access to the JAR I'm using as well give you access to a couchbase cluster to actually test it end-to-end.

AV
New Contributor III

Hello @Xin Wang​  thank you for helping us out. Any further updates on this ?

AV
New Contributor III

Hello @Xin Wang​ , do you have everything that you had asked for previously, any ETA on this. Please let us know we are currently blocked. Appreciate a quick turnaround 🙂

User16752239289
Valued Contributor
Valued Contributor

Hello @ARUN VIJAYRAGHAVAN​  really apologize for the late response. Could you add  spark.driver.extraJavaOptions verbose:class to your local spark distributed setup? So that we will have the same logs as you posted before for Databricks cluster. I need to do a comparison between local spark distributed setup and databricks cluster.

Here you go: https://gist.github.com/daschl/c2528af17af727d0688f4366d2177498 .. I ran a local master and worker and then published the same app into it with spark-submit. Note that our connector in this case is provided via --jars

(./spark-submit --jars ~/code/couchbase-spark-connector/target/scala-2.12/spark-connector-assembly-3.2.0-SNAPSHOT.jar --conf "spark.executor.extraJavaOptions=-verbose:class" --master spark://machine.local:7077 ~/code/scala/spark3-examples/target/scala-2.12/spark3-examples_2.12-1.0.0-SNAPSHOT.jar)

From the logs I can see that locally this is loaded:

[7.160s][info][class,load] org.apache.spark.sql.catalyst.json.CreateJacksonParser$ source: file:/Users/myuser/Downloads/spark-3.2.0-bin-hadoop3.2/jars/spark-catalyst_2.12-3.2.0.jar

But it seems to be missing in the databricks environment.

AV
New Contributor III

Hello @Xin Wang​  thank you so much for your response earlier. totally understand. I think @Michael Nitschinger​  has provided you with the information thats needed. Should that suffice your needs for debugging ? please let us know, this is a blocker for us and our customers.

daschl
Contributor

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released with the next minor version (connector 3.2.0). Nonetheless I still think this is an issue in the databricks notebook and should be addressed on your side?

Kaniz
Community Manager
Community Manager

Thanks for flagging @Michael Nitschinger​ .

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.