โ11-15-2021 05:38 PM
On latest DB-Connect==9.1.3 and dbr == 9.1, retrieving data from mongo using Maven coordinate of Mongo Spark Connector: org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 - https://docs.mongodb.com/spark-connector/current/ - working fine previously throws
...
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
: java.lang.ClassNotFoundException: Failed to find data source: mongo. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:765)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:819)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:101)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:80)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.service.SparkServiceRPCHandler$.getOrLoadAnonymousRelation(SparkServiceRPCHandler.scala:80)
at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:715)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:478)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:370)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:321)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:307)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:357)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:334)
at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: mongo.DefaultSource
at java.lang.ClassLoader.findClass(ClassLoader.java:524)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:739)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:739)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:739)
... 47 more
reverting DBC to version 8 won't really help since the supported dbr version (8.1) is no longer available. reverting DBC to version 7 throws other issue.
Any workarounds / solutions?
โ01-24-2022 11:52 PM
โ11-15-2021 06:40 PM
I have exactly the same problem with my databricks-connect 9.1.2. Also tried explicit format name instead of 'mongo' but it didn't work. Please help!
spark.read.format('com.mongodb.spark.sql.DefaultSource')
โ11-22-2021 07:42 AM
Hi @Kaniz Fatmaโ
The code is simple and it worked on a databricks notebook:
(spark.read.format('mongo')
.option("uri", uri)
.option("database", "abc")
.option("collection", "xyz")
.load()).display()
Is it possible to run this code via databricks-connect?
โ11-22-2021 04:39 PM
@Kaniz Fatmaโ Mine is very very similar to @Hugh Voโ , it's just a standard spark read using `mongo` format.
โ11-22-2021 01:12 PM
@Yikun Songโ - Would you be happy to mark which answer is best so that others can find the solution more easily?
โ11-22-2021 04:38 PM
i am more than happy too, but there is no answer yet.
โ11-22-2021 05:47 PM
@Yikun Songโ - Thank you.
โ12-20-2021 10:21 AM
Hi @Yikun Songโ
We found an issue with it that will be fixed in the next round of patches (to be released mid-January).
As a workaround,
So adding the assembly jar to --jars, and first running a SELECT 1 query to make sure that it gets synced to the server should be a temp working workaround.
โ01-24-2022 11:52 PM
Folks the latest databricks-connect==9.1.7 fixed this.
โ01-29-2022 04:21 AM
It works now. Thanks!
โ06-20-2023 11:32 PM
Hi, I encountered the same issue with your problem. I know it's late, but do u still have a copy of your code so I can try it also with mine? Thanks so much ๐
โ07-24-2023 06:53 AM - edited โ07-25-2023 03:55 AM
Hi everyone the solution for me it was to replace spark.read.format("mongo") by spark.read.format("mongodb") my spark version is 3.3.2 and my mongodb version is 6.0.6 .
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group