cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Pyspark Dataframe error while displaying data read from mongodb

pankaj30
New Contributor II

Hi ,

We are trying to read data from mongodb using databricks notebook with pyspark connectivity.

When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key count" 

Data in mongo collection is in timeseries (struct) format .

connectionString='mongodb+srv://CONNECTION_STRING_HERE/
database="sample_supplies"
collection="sales"
salesDF = spark.read.format("mongo").option("database", database).option("collection", collection).option("spark.mongodb.input.uri", connectionString).load()
display(salesDF)

"org.bson.BsonInvalidOperationException:Document does not contain key count" 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @pankaj30Thank you for your question! This error typically occurs when there’s a mismatch between the MongoDB driver and Spark connector versions.

  • Are you sure your code has all the necessary MongoDB drivers and BSON libraries available for Spark?
  • If Yes, please check if you have downloaded the below JAR files(for the appropriate Spark and Scala versions) from Maven:
    • mongo-spark-connector
    • mongodb-driver-sync
    • mongodb-driver-core
    • bson
  • These JAR files contain the necessary classes and methods for MongoDB connectivity.
  • You can find these JAR files on Maven Central or other repositories.
  • Place these JAR files in a directory accessible to your Spark cluster.
  • In your Databricks Notebook, set the Spark configuration to include the paths to the downloaded JAR files:
spark.conf.set("spark.jars", "/path/to/mongo-spark-connector.jar,/path/to/mongodb-driver-sync.jar,/path/to/mongodb-driver-core.jar,/path/to/bson.jar")
  • Ensure that your data schema matches the expected schema when reading it into a DataFrame. If there are missing fields or inconsistencies, it can lead to issues like the one you’re encountering.
  • Make sure your connectionString is correctly formatted. It should include the MongoDB server details, username, password, and other required parameters.
  • Verify that the database and collection names match the actual names in your MongoDB instance.

  • Once you’ve resolved the Bson reference issue, use the display(salesDF) command again to show the data in your DataFrame.

  • If you encounter any further issues, please ask for additional assistance! 

pankaj30
New Contributor II

Hi @Kaniz_Fatma  , I tried all above steps, still didn't work. Parallelly checking with Mongo team.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!