Databricks Pyspark Dataframe error while displaying data read from mongodb

pankaj30
New Contributor II

Hi ,

We are trying to read data from mongodb using databricks notebook with pyspark connectivity.

When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count" 

Data in mongo collection is in timeseries (struct) format .

connectionString='mongodb+srv://CONNECTION_STRING_HERE/
database="sample_supplies"
collection="sales"
salesDF = spark.read.format("mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()
display(salesDF)

"org.bson.BsonInvalidOperationException:Document does not contain key count" 

pankaj30
New Contributor II

Hi @Retired_mod  , I tried all above steps, still didn't work. Parallelly checking with Mongo team.

an313x
New Contributor III

Thanks, @Retired_mod for your input. I had the same problem and couldn't display the dataframe and I had only mongo-spark-connector installed on my cluster (DBR 14.3 LTS Spark 3.5.0 and Scala 2.12). After I installed the rest of the suggested JAR files it still failed, but after I changed DBR to 13.3 LTS Spark 3.4.1 and Scala 2.12 it worked.

View solution in original post

an313x
New Contributor III

UPDATE:
Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframe

  • bson
  • mongodb-driver-core
  • mongodb-driver-sync

Also, I noticed that both DBR 13.3 LTS and 14.3 LTS work fine with this specific spark connector JAR file installed on the cluster.

View solution in original post