Data Engineering
Databricks Pyspark Dataframe error while displaying data read from mongodb

New Contributor II

Hi ,

We are trying to read data from mongodb using databricks notebook with pyspark connectivity.

When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count" 

Data in mongo collection is in timeseries (struct) format .

salesDF ="mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()

"org.bson.BsonInvalidOperationException:Document does not contain key count" 


Community Manager
Community Manager

Hi @pankaj30Thank you for your question! This error typically occurs when there’s a mismatch between the MongoDB driver and Spark connector versions.

  • Are you sure your code has all the necessary MongoDB drivers and BSON libraries available for Spark?
  • If Yes, please check if you have downloaded the below JAR files(for the appropriate Spark and Scala versions) from Maven:
    • mongo-spark-connector
    • mongodb-driver-sync
    • mongodb-driver-core
    • bson
  • These JAR files contain the necessary classes and methods for MongoDB connectivity.
  • You can find these JAR files on Maven Central or other repositories.
  • Place these JAR files in a directory accessible to your Spark cluster.
  • In your Databricks Notebook, set the Spark configuration to include the paths to the downloaded JAR files:
spark.conf.set("spark.jars", "/path/to/mongo-spark-connector.jar,/path/to/mongodb-driver-sync.jar,/path/to/mongodb-driver-core.jar,/path/to/bson.jar")
  • Ensure that your data schema matches the expected schema when reading it into a DataFrame. If there are missing fields or inconsistencies, it can lead to issues like the one you’re encountering.
  • Make sure your connectionString is correctly formatted. It should include the MongoDB server details, username, password, and other required parameters.
  • Verify that the database and collection names match the actual names in your MongoDB instance.

  • Once you’ve resolved the Bson reference issue, use the display(salesDF) command again to show the data in your DataFrame.

  • If you encounter any further issues, please ask for additional assistance! 

New Contributor II

Hi @Kaniz_Fatma  , I tried all above steps, still didn't work. Parallelly checking with Mongo team.

