cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue with MongoDB Spark Connector in Databricks

vidya_kothavale
New Contributor III

 

I followed the official Databricks documentation("https://docs.databricks.com/en/_extras/notebooks/source/mongodb.html")

to integrate MongoDB Atlas with Spark by setting up the MongoDB Spark Connector and configuring the connection string in my Databricks cluster. However, I am encountering issues when trying to read data from MongoDB using Spark.

While I can successfully connect to MongoDB using the MongoClient in Python and execute queries like

from pymongo import MongoClient
client = MongoClient("connectionstring")
db = client["demo"]
collection = db["demo_collection"]
print(collection.find_one())

 I am unable to load data using the Spark connector with the following code:

df = spark.read.format("mongodb") \ .option("database", database) \ .option("spark.mongodb.input.uri", connectionString) \ .option("collection", "demo_collection") \ .load() df.printSchema()

The connection string is the same in both cases, and I have confirmed that the necessary permissions and IP whitelisting are correctly configured in MongoDB Atlas.

Despite this, no data is being retrieved when using Spark, and Iโ€™m unable to identify the issue.

also, I attached error screenshot below.

Can anyone provide guidance on potential configuration issues or additional steps needed to troubleshoot this problem with the MongoDB Spark connector in Databricks?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

 

View solution in original post

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

 

Thanks! @szymon_dybczak  It's working perfectly now.

Hi vidya. I have the same problem.

I can connect using pymongo  and compass . I installed the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1 (latest one) on my cluster using the runtime 16.2 but I never was able to connect to same mongo cluster (sharded) using the primary as default.

This is the scala code (I've tested in python as well)

val connstr = "mongodb://user:xxxxxxx@cluster/dbxxx?tls=true&tlsInsecure=true&authSource=admin"

val df = spark.read.format("mongodb")
.option("database", "dbdbdbdbdb")
.option("spark.mongodb.read.connection.uri", connstr)
.option("collection", "cccccccccc")
.load().limit(5)
 
Also I can telnet the cluster successfully .
 
Any clues?