11-15-2021 05:38 PM
On latest DB-Connect==9.1.3 and dbr == 9.1, retrieving data from mongo using Maven coordinate of Mongo Spark Connector: org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 - https://docs.mongodb.com/spark-connector/current/ - working fine previously throws
...
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
: java.lang.ClassNotFoundException: Failed to find data source: mongo. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:765)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:819)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:101)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:80)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.service.SparkServiceRPCHandler$.getOrLoadAnonymousRelation(SparkServiceRPCHandler.scala:80)
at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:715)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:478)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:370)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:321)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:307)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:357)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:334)
at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: mongo.DefaultSource
at java.lang.ClassLoader.findClass(ClassLoader.java:524)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:739)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:739)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:739)
... 47 more
reverting DBC to version 8 won't really help since the supported dbr version (8.1) is no longer available. reverting DBC to version 7 throws other issue.
Any workarounds / solutions?
01-24-2022 11:52 PM
11-15-2021 06:40 PM
I have exactly the same problem with my databricks-connect 9.1.2. Also tried explicit format name instead of 'mongo' but it didn't work. Please help!
spark.read.format('com.mongodb.spark.sql.DefaultSource')
11-16-2021 12:24 AM
Hi @tigger and Shadowsong27! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
11-21-2021 04:16 PM
Hi @Yikun Song , Can you please attach the snap of your code?
11-22-2021 07:42 AM
Hi @Kaniz Fatma
The code is simple and it worked on a databricks notebook:
(spark.read.format('mongo')
.option("uri", uri)
.option("database", "abc")
.option("collection", "xyz")
.load()).display()
Is it possible to run this code via databricks-connect?
11-22-2021 04:39 PM
@Kaniz Fatma Mine is very very similar to @Hugh Vo , it's just a standard spark read using `mongo` format.
11-22-2021 01:12 PM
@Yikun Song - Would you be happy to mark which answer is best so that others can find the solution more easily?
11-22-2021 04:38 PM
i am more than happy too, but there is no answer yet.
11-22-2021 05:47 PM
@Yikun Song - Thank you.
12-20-2021 10:21 AM
Hi @Yikun Song
We found an issue with it that will be fixed in the next round of patches (to be released mid-January).
As a workaround,
So adding the assembly jar to --jars, and first running a SELECT 1 query to make sure that it gets synced to the server should be a temp working workaround.
01-24-2022 11:52 PM
Folks the latest databricks-connect==9.1.7 fixed this.
01-25-2022 12:07 AM
Awesome!
01-29-2022 04:21 AM
It works now. Thanks!
06-20-2023 11:32 PM
Hi, I encountered the same issue with your problem. I know it's late, but do u still have a copy of your code so I can try it also with mine? Thanks so much 🙂
07-24-2023 06:53 AM - edited 07-25-2023 03:55 AM
Hi everyone the solution for me it was to replace spark.read.format("mongo") by spark.read.format("mongodb") my spark version is 3.3.2 and my mongodb version is 6.0.6 .
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group