I have facing a error when I am trying to read data from any MongoDB collection using MongoDB Spark Connector v10.x on Databricks v13.x.
The below error appear to start at line #113 of MongoDB Spark Connector Library (v10.2.0):
java.lang.NoSuchMethodError: org.apache.spark.sql.types.DataType.sameType(Lorg/apache/spark/sql/types/DataType;)Z
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
File <command-3492412077247672>:6
1 mongo_opts = {'connection.uri': conf.mongodb.read_uri,
2 'database': 'setorizacao',
3 'collection': 'outlet',
4 'outputExtendedJson': 'true'}
----> 6 mongo_outl = spark.read.load(format='mongodb', **mongo_opts)
File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs)
46 start = time.perf_counter()
47 try:
---> 48 res = func(*args, **kwargs)
49 logger.log_success(
50 module_name, class_name, function_name, time.perf_counter() - start, signature
51 )
52 return res
File /databricks/spark/python/pyspark/sql/readwriter.py:314, in DataFrameReader.load(self, path, format, schema, **options)
312 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
313 else:
--> 314 return self._df(self._jreader.load())
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
186 def deco(*a: Any, **kw: Any) -> Any:
187 try:
--> 188 return f(*a, **kw)
189 except Py4JJavaError as e:
190 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o1020.load.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.DataType.sameType(Lorg/apache/spark/sql/types/DataType;)Z
at com.mongodb.spark.sql.connector.schema.InferSchema.lambda$inferSchema$4(InferSchema.java:103)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
at com.mongodb.spark.sql.connector.schema.InferSchema.inferSchema(InferSchema.java:112)
at com.mongodb.spark.sql.connector.schema.InferSchema.inferSchema(InferSchema.java:78)
at com.mongodb.spark.sql.connector.MongoTableProvider.inferSchema(MongoTableProvider.java:60)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:91)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:138)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:333)
at scala.Option.flatMap(Option.scala:271)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:331)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
I have tested all versions of Spark Connector from 10.1.0 to 10.2.0. I have also tested all versions of Databricks 13, from 13.0 to 13.2. I have tested in versions 5 and 6 of MongoDB server (Atlas).
For now I am using the library with the Maven Repository coordinates org.mongodb.spark:mongo-spark-connector_2.12:10.2.0, but previously I have also used the official jar file avaliable in this link.
Using the version 3.0.2 of Spark Connector works well for reading and writing operations. Write operations also works fine in versions 10.x of Spark Connector.
I have tried to read data of same collections of MongoDB in a local setup of Spark and this worked normally. For this, I have used the version Spark 3.4.1, Java 11.0.19 (Azul Zulu) and Python 3.10.6 (for PySpark).
The error not occurs in Databricks 12.2 and below.
Configuration to reproduce error:
- Databricks: 13.2 (Apache Spark 3.4.0, Scala 2.12, Python 3.10.6)
- MongoDB Spark Connector: 10.2.0 (Scala 2.12)
Maven Coordinates: org.mongodb.spark:mongo-spark-connector_2.12:10.2.0 - MongoDB: Atlas free tier (version 6).