Databricks Pyspark Dataframe error while displayin...

pankaj30 · ‎03-19-2024

Hi ,

We are trying to read data from mongodb using databricks notebook with pyspark connectivity.

When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count"

Data in mongo collection is in timeseries (struct) format .

connectionString='mongodb+srv://CONNECTION_STRING_HERE/

database="sample_supplies"

collection="sales"

salesDF = spark.read.format("mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()

display(salesDF)

"org.bson.BsonInvalidOperationException:Document does not contain key count"

pankaj30 · ‎03-19-2024

Hi @Retired_mod , I tried all above steps, still didn't work. Parallelly checking with Mongo team.

an313x · ‎07-19-2024

Thanks, @Retired_mod for your input. I had the same problem and couldn't display the dataframe and I had only mongo-spark-connector installed on my cluster (DBR 14.3 LTS Spark 3.5.0 and Scala 2.12). After I installed the rest of the suggested JAR files it still failed, but after I changed DBR to 13.3 LTS Spark 3.4.1 and Scala 2.12 it worked.

View solution in original post

an313x · ‎07-19-2024

UPDATE:
Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframe

bson
mongodb-driver-core
mongodb-driver-sync

Also, I noticed that both DBR 13.3 LTS and 14.3 LTS work fine with this specific spark connector JAR file installed on the cluster.

View solution in original post

Databricks Pyspark Dataframe error while displaying data read from mongodb