cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Pyspark Dataframe error while displaying data read from mongodb

pankaj30
New Contributor II

Hi ,

We are trying to read data from mongodb using databricks notebook with pyspark connectivity.

When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count" 

Data in mongo collection is in timeseries (struct) format .

connectionString='mongodb+srv://CONNECTION_STRING_HERE/
database="sample_supplies"
collection="sales"
salesDF = spark.read.format("mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()
display(salesDF)

"org.bson.BsonInvalidOperationException:Document does not contain key count" 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @pankaj30Thank you for your question! This error typically occurs when there’s a mismatch between the MongoDB driver and Spark connector versions.

  • Are you sure your code has all the necessary MongoDB drivers and BSON libraries available for Spark?
  • If Yes, please check if you have downloaded the below JAR files(for the appropriate Spark and Scala versions) from Maven:
    • mongo-spark-connector
    • mongodb-driver-sync
    • mongodb-driver-core
    • bson
  • These JAR files contain the necessary classes and methods for MongoDB connectivity.
  • You can find these JAR files on Maven Central or other repositories.
  • Place these JAR files in a directory accessible to your Spark cluster.
  • In your Databricks Notebook, set the Spark configuration to include the paths to the downloaded JAR files:
spark.conf.set("spark.jars", "/path/to/mongo-spark-connector.jar,/path/to/mongodb-driver-sync.jar,/path/to/mongodb-driver-core.jar,/path/to/bson.jar")
  • Ensure that your data schema matches the expected schema when reading it into a DataFrame. If there are missing fields or inconsistencies, it can lead to issues like the one you’re encountering.
  • Make sure your connectionString is correctly formatted. It should include the MongoDB server details, username, password, and other required parameters.
  • Verify that the database and collection names match the actual names in your MongoDB instance.

  • Once you’ve resolved the Bson reference issue, use the display(salesDF) command again to show the data in your DataFrame.

  • If you encounter any further issues, please ask for additional assistance! 

pankaj30
New Contributor II

Hi @Kaniz  , I tried all above steps, still didn't work. Parallelly checking with Mongo team.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.