cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

MongoDB Spark Connector v10.x read error on Databricks 13.x

silvadev
New Contributor III

I have facing a error when I am trying to read data from any MongoDB collection using MongoDB Spark Connector v10.x on Databricks v13.x.

The below error appear to start at line #113 of MongoDB Spark Connector Library (v10.2.0):

 

 

java.lang.NoSuchMethodError: org.apache.spark.sql.types.DataType.sameType(Lorg/apache/spark/sql/types/DataType;)Z
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-3492412077247672>:6
      1 mongo_opts = {'connection.uri': conf.mongodb.read_uri, 
      2               'database': 'setorizacao',
      3               'collection': 'outlet',
      4               'outputExtendedJson': 'true'}
----> 6 mongo_outl = spark.read.load(format='mongodb', **mongo_opts)

File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs)
     46 start = time.perf_counter()
     47 try:
---> 48     res = func(*args, **kwargs)
     49     logger.log_success(
     50         module_name, class_name, function_name, time.perf_counter() - start, signature
     51     )
     52     return res

File /databricks/spark/python/pyspark/sql/readwriter.py:314, in DataFrameReader.load(self, path, format, schema, **options)
    312     return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    313 else:
--> 314     return self._df(self._jreader.load())

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
    186 def deco(*a: Any, **kw: Any) -> Any:
    187     try:
--> 188         return f(*a, **kw)
    189     except Py4JJavaError as e:
    190         converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o1020.load.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.DataType.sameType(Lorg/apache/spark/sql/types/DataType;)Z
	at com.mongodb.spark.sql.connector.schema.InferSchema.lambda$inferSchema$4(InferSchema.java:103)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
	at com.mongodb.spark.sql.connector.schema.InferSchema.inferSchema(InferSchema.java:112)
	at com.mongodb.spark.sql.connector.schema.InferSchema.inferSchema(InferSchema.java:78)
	at com.mongodb.spark.sql.connector.MongoTableProvider.inferSchema(MongoTableProvider.java:60)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:91)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:138)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:333)
	at scala.Option.flatMap(Option.scala:271)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:331)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)

 

 

I have tested all versions of Spark Connector from 10.1.0 to 10.2.0. I have also tested all versions of Databricks 13, from 13.0 to 13.2. I have tested in versions 5 and 6 of MongoDB server (Atlas).

For now I am using the library with the Maven Repository coordinates org.mongodb.spark:mongo-spark-connector_2.12:10.2.0, but previously I have also used the official jar file avaliable in this link.

Using the version 3.0.2 of Spark Connector works well for reading and writing operations. Write operations also works fine in versions 10.x of Spark Connector. 

I have tried to read data of same collections of MongoDB in a local setup of Spark and this worked normally. For this, I have used the version Spark 3.4.1, Java 11.0.19 (Azul Zulu) and Python 3.10.6 (for PySpark).

The error not occurs in Databricks 12.2 and below.

Configuration to reproduce error:

  • Databricks: 13.2 (Apache Spark 3.4.0, Scala 2.12, Python 3.10.6)
  • MongoDB Spark Connector: 10.2.0 (Scala 2.12)
    Maven Coordinates: org.mongodb.spark:mongo-spark-connector_2.12:10.2.0
  • MongoDB: Atlas free tier (version 6).
1 ACCEPTED SOLUTION

Accepted Solutions

silvadev
New Contributor III

The problem was fixed in Databricks Runtime 13.3 LTS.

View solution in original post

1 REPLY 1

silvadev
New Contributor III

The problem was fixed in Databricks Runtime 13.3 LTS.