Databricks Community

daschl · ‎11-09-2021

Hi,

I'm working for Couchbase on the Couchbase Spark Connector and noticed something weird which I haven't been able to get to the bottom of so far.

For query DataFrames we use the Datasource v2 API and we delegate the JSON parsing to the org.apache.spark.sql.catalyst.json.CreateJacksonParser -- (https://github.com/couchbase/couchbase-spark-connector/blob/master/src/main/scala/com/couchbase/spark/query/QueryPartitionReader.scala#L56) .. this all works fine, both in a local IDE setup or when the job is sent to a local spark distributed setup.

But when I run it in a databricks notebook, I get:

Job aborted due to stage failure.
Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser;
	at org.apache.spark.sql.CouchbaseJsonUtils$.$anonfun$createParser$1(CouchbaseJsonUtils.scala:41)
	at org.apache.spark.sql.catalyst.json.JacksonParser.$anonfun$parse$1(JacksonParser.scala:490)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2952)
	at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:490)
	at com.couchbase.spark.query.QueryPartitionReader.$anonfun$rows$2(QueryPartitionReader.scala:54)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
....
	at java.lang.Thread.run(Thread.java:748)

Any idea why Caused by: NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser$.string(Lcom/fasterxml/jackson/core/JsonFactory;Ljava/lang/String;)Lcom/fasterxml/jackson/core/JsonParser; is not available in this environment?

Thanks,

Michael

daschl · ‎12-27-2021

Since there hasn't been any progress on this for over a month, I applied a workaround and copied the classes into the connector source code so we don't have to rely on the databricks classloader. It seems to work in my testing and will be released with the next minor version (connector 3.2.0). Nonetheless I still think this is an issue in the databricks notebook and should be addressed on your side?

View solution in original post

daschl · ‎11-09-2021

@Kaniz Fatma thanks for your reply. Since this question is very implementation specific and not really related to general usage, would it make sense to connect me to an engineer familiar with the environment and the internals of the datasource v2 API? Can also be via email or a different channel.

daschl · ‎11-10-2021

@Kaniz Fatma the entire question is in the original post - if there is further clarification needed I'm happy to provide that.

AV · ‎11-10-2021

@Kaniz Fatma would appreciate if you can assign someone to help us get past this hurdle.

daschl · ‎11-10-2021

@Kaniz Fatma I think you are not quite understanding - we are currently in the process of updating the exact page you linked (we work for couchbase!) and in that process of updating to Spark 3 we ran into the issue above. So this is specific to the databricks notebook platform, since it works with a standalone spark application... what you are telling us here is to "turn it off and on again", and we'd appreciate if it's possible to get some input from actual Databricks engineers working on that environment. Thank you!

Atanu · ‎11-12-2021

Hello @Michael Nitschinger , I am not aware of your cluster config but you may consider this jar to be updated as library. and see if are still running into this issue.

Also, please look into this -

You cannot access this data source from a cluster running Databricks Runtime 7.0 or above because a Couchbase connector that supports Apache Spark 3.0 is not available.

https://docs.databricks.com/data/data-sources/couchbase.html?_ga=2.268369358.1592229266.1636717958-1...

daschl · ‎11-12-2021

@Atanu Sarkar what do you mean by updating the jar? The Couchbase connector supports apache spark 3.0, I wrote the new connector. We are planning to update the page you linked and we ran into the issue above. I need someone to help me debug why our Spark connector works under Spark 3 but not under Databricks Notebook.

jose_gonzalez · ‎11-29-2021

hi @Michael Nitschinger ,

Are you unblocked or still facing this issue?

daschl · ‎11-29-2021

Yes still facing the issue as described above!

User16752239289 · ‎12-02-2021

@Michael Nitschinger Could you let us know your cluster DBR versions?

If you can add spark configuration: spark.driver.extraJavaOptions verbose:class to your cluster and run your use case again, this will print out the class org.apache.spark.sql.catalyst.json.CreateJacksonParser is loading from which jar in the driver stdout logs.

With those two info, I can decompile the jar and find out the root cause of the issue.

daschl · ‎12-02-2021

answered below, hope that helps!

daschl · ‎12-02-2021

So I've been using DBR 10.1 right now. Here are all the driver logs: https://gist.github.com/daschl/8f3e996caf003a903006fff57d6396e3

If needed (via email) I can also give you access to the JAR I'm using as well give you access to a couchbase cluster to actually test it end-to-end.

AV · ‎12-06-2021

Hello @Xin Wang thank you for helping us out. Any further updates on this ?

AV · ‎12-10-2021

Hello @Xin Wang , do you have everything that you had asked for previously, any ETA on this. Please let us know we are currently blocked. Appreciate a quick turnaround 🙂

User16752239289 · ‎12-10-2021

Hello @ARUN VIJAYRAGHAVAN really apologize for the late response. Could you add spark.driver.extraJavaOptions verbose:class to your local spark distributed setup? So that we will have the same logs as you posted before for Databricks cluster. I need to do a comparison between local spark distributed setup and databricks cluster.

Databricks Community

NoSuchMethodError: org.apache.spark.sql.catalyst.json.CreateJacksonParser on Databricks Cloud (but not on Spark Directly)

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟