Databricks Community

daschl · ‎11-09-2021

Hi,

I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.

For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.spark.sql.catalyst.json.CreateJacksonParser -- (https://github.com/couchbase/couchbase-spark-connector/blob/master/src/main/scala/com/couchbase/spark/query/QueryPartitionReader.scala#L56) .. this all works fine, both in a local IDE setup or when the job is sent to a local spark distributed setup.

But when I run it in a databricks notebook, I get:

Job aborted due to stage failure.
Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser;
	at org.apache.spark.sql.CouchbaseJsonUtils$.$anonfun$createParser$1(CouchbaseJsonUtils.scala:41)
	at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$parse$1(JacksonParser.scala:490)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2952)
	at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:490)
	at com.couchbase.spark.query.QueryPartitionReader.$anonfun$rows$2(QueryPartitionReader.scala:54)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
....
	at java.lang.Thread.run(Thread.java:748)

Any idea why Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser; is not available in this environment?

Thanks,

Michael

daschl · ‎12-13-2021

Here you go: https://gist.github.com/daschl/c2528af17af727d0688f4366d2177498 .. I ran a local master and worker and then published the same app into it with spark-submit. Note that our connector in this case is provided via --jars

(./spark-submit --jars ~/code/couchbase-spark-connector/target/scala-2.12/spark-connector-assembly-3.2.0-SNAPSHOT.jar --conf "spark.executor.extraJavaOptions=-verbose:class" --master spark://machine.local:7077 ~/code/scala/spark3-examples/target/scala-2.12/spark3-examples_2.12-1.0.0-SNAPSHOT.jar)

daschl · ‎12-13-2021

From the logs I can see that locally this is loaded:

[7.160s][info][class,load] org.apache.spark.sql.catalyst.json.CreateJacksonParser$ source: file:/Users/myuser/Downloads/spark-3.2.0-bin-hadoop3.2/jars/spark-catalyst_2.12-3.2.0.jar

But it seems to be missing in the databricks environment.

AV · ‎12-13-2021

Hello @Xin Wang thank you so much for your response earlier. totally understand. I think @Michael Nitschinger has provided you with the information thats needed. Should that suffice your needs for debugging ? please let us know, this is a blocker for us and our customers.

daschl · ‎12-27-2021

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released with the next minor version (connector 3.2.0). Nonetheless I still think this is an issue in the databricks notebook and should be addressed on your side?

Databricks Community

NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon