โ03-19-2024 02:52 AM
Hi ,
We are trying to read data from mongodb using databricks notebook with pyspark connectivity.
When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count"
Data in mongo collection is in timeseries (struct) format .
connectionString='mongodb+srv://CONNECTION_STRING_HERE/
database="sample_supplies"
collection="sales"
salesDF = spark.read.format("mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()
display(salesDF)
"org.bson.BsonInvalidOperationException:Document does not contain key count"
โ07-19-2024 08:17 AM
Thanks, @Kaniz_Fatma for your input. I had the same problem and couldn't display the dataframe and I had only mongo-spark-connector installed on my cluster (DBR 14.3 LTS Spark 3.5.0 and Scala 2.12). After I installed the rest of the suggested JAR files it still failed, but after I changed DBR to 13.3 LTS Spark 3.4.1 and Scala 2.12 it worked.
โ07-19-2024 09:20 AM
UPDATE:
Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframe
Also, I noticed that both DBR 13.3 LTS and 14.3 LTS work fine with this specific spark connector JAR file installed on the cluster.
โ03-19-2024 03:33 AM
Hi @pankaj30, Thank you for your question! This error typically occurs when thereโs a mismatch between the MongoDB driver and Spark connector versions.
mongo-spark-connector
mongodb-driver-sync
mongodb-driver-core
bson
spark.conf.set("spark.jars", "/path/to/mongo-spark-connector.jar,/path/to/mongodb-driver-sync.jar,/path/to/mongodb-driver-core.jar,/path/to/bson.jar")
connectionString
is correctly formatted. It should include the MongoDB server details, username, password, and other required parameters.Verify that the database
and collection
names match the actual names in your MongoDB instance.
Once youโve resolved the Bson reference issue, use the display(salesDF)
command again to show the data in your DataFrame.
โ03-19-2024 06:08 AM
Hi @Kaniz_Fatma , I tried all above steps, still didn't work. Parallelly checking with Mongo team.
โ07-19-2024 08:17 AM
Thanks, @Kaniz_Fatma for your input. I had the same problem and couldn't display the dataframe and I had only mongo-spark-connector installed on my cluster (DBR 14.3 LTS Spark 3.5.0 and Scala 2.12). After I installed the rest of the suggested JAR files it still failed, but after I changed DBR to 13.3 LTS Spark 3.4.1 and Scala 2.12 it worked.
โ07-19-2024 09:20 AM
UPDATE:
Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframe
Also, I noticed that both DBR 13.3 LTS and 14.3 LTS work fine with this specific spark connector JAR file installed on the cluster.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group