โ11-15-2021 05:38 PM
On latest DB-Connect==9.1.3 and dbr == 9.1, retrieving data from mongo using Maven coordinate of Mongo Spark Connector: org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 - https://docs.mongodb.com/spark-connector/current/ - working fine previously throws
...
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
: java.lang.ClassNotFoundException: Failed to find data source: mongo. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:765)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:819)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:101)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:80)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.service.SparkServiceRPCHandler$.getOrLoadAnonymousRelation(SparkServiceRPCHandler.scala:80)
at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:715)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:478)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:370)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:321)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:307)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:357)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:334)
at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: mongo.DefaultSource
at java.lang.ClassLoader.findClass(ClassLoader.java:524)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:739)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:739)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:739)
... 47 more
reverting DBC to version 8 won't really help since the supported dbr version (8.1) is no longer available. reverting DBC to version 7 throws other issue.
Any workarounds / solutions?
โ01-24-2022 11:52 PM
โ11-15-2021 06:40 PM
I have exactly the same problem with my databricks-connect 9.1.2. Also tried explicit format name instead of 'mongo' but it didn't work. Please help!
spark.read.format('com.mongodb.spark.sql.DefaultSource')
โ11-22-2021 07:42 AM
Hi @Kaniz Fatmaโ
The code is simple and it worked on a databricks notebook:
(spark.read.format('mongo')
.option("uri", uri)
.option("database", "abc")
.option("collection", "xyz")
.load()).display()
Is it possible to run this code via databricks-connect?
โ11-22-2021 04:39 PM
@Kaniz Fatmaโ Mine is very very similar to @Hugh Voโ , it's just a standard spark read using `mongo` format.
โ11-22-2021 01:12 PM
@Yikun Songโ - Would you be happy to mark which answer is best so that others can find the solution more easily?
โ11-22-2021 04:38 PM
i am more than happy too, but there is no answer yet.
โ11-22-2021 05:47 PM
@Yikun Songโ - Thank you.
โ12-20-2021 10:21 AM
Hi @Yikun Songโ
We found an issue with it that will be fixed in the next round of patches (to be released mid-January).
As a workaround,
So adding the assembly jar to --jars, and first running a SELECT 1 query to make sure that it gets synced to the server should be a temp working workaround.
โ01-24-2022 11:52 PM
Folks the latest databricks-connect==9.1.7 fixed this.
โ01-29-2022 04:21 AM
It works now. Thanks!
โ06-20-2023 11:32 PM
Hi, I encountered the same issue with your problem. I know it's late, but do u still have a copy of your code so I can try it also with mine? Thanks so much ๐
โ07-24-2023 06:53 AM - edited โ07-25-2023 03:55 AM
Hi everyone the solution for me it was to replace spark.read.format("mongo") by spark.read.format("mongodb") my spark version is 3.3.2 and my mongodb version is 6.0.6 .
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now