Databricks Community

vidya_kothavale · ‎01-27-2025

I followed the official Databricks documentation("https://docs.databricks.com/en/_extras/notebooks/source/mongodb.html")

to integrate MongoDB Atlas with Spark by setting up the MongoDB Spark Connector and configuring the connection string in my Databricks cluster. However, I am encountering issues when trying to read data from MongoDB using Spark.

While I can successfully connect to MongoDB using the MongoClient in Python and execute queries like

from pymongo import MongoClient

client = MongoClient("connectionstring")

db = client["demo"]

collection = db["demo_collection"]

print(collection.find_one())

I am unable to load data using the Spark connector with the following code:

df = spark.read.format("mongodb") \ .option("database", database) \ .option("spark.mongodb.input.uri", connectionString) \ .option("collection", "demo_collection") \ .load() df.printSchema()

The connection string is the same in both cases, and I have confirmed that the necessary permissions and IP whitelisting are correctly configured in MongoDB Atlas.

Despite this, no data is being retrieved when using Spark, and I’m unable to identify the issue.

also, I attached error screenshot below.

Can anyone provide guidance on potential configuration issues or additional steps needed to troubleshoot this problem with the MongoDB Spark connector in Databricks?

szymon_dybczak · ‎01-27-2025

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

View solution in original post

szymon_dybczak · ‎01-27-2025

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

vidya_kothavale · ‎01-27-2025

Thanks! @szymon_dybczak It's working perfectly now.

RobsonNLPT · ‎03-01-2025

Hi vidya. I have the same problem.

I can connect using pymongo and compass . I installed the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1 (latest one) on my cluster using the runtime 16.2 but I never was able to connect to same mongo cluster (sharded) using the primary as default.

This is the scala code (I've tested in python as well)

val connstr = "mongodb://user:xxxxxxx@cluster/dbxxx?tls=true&tlsInsecure=true&authSource=admin"

val df = spark.read.format("mongodb")

.option("database", "dbdbdbdbdb")

.option("spark.mongodb.read.connection.uri", connstr)

.option("collection", "cccccccccc")

.load().limit(5)

Also I can telnet the cluster successfully .

Any clues?

prateekmanocha · ‎09-15-2025

Hey @szymon_dybczak @vidya_kothavale facing similar issue not sure how to resolve..

cluster runtime : 16.4 LTS (includes Apache Spark 3.5.2, Scala 2.12)
mongodb spark connector : org.mongodb.spark:mongo-spark-connector_2.12:10.5.0

code:

df = (spark.read

.format("mongodb")

.option("spark.mongodb.read.connection.uri", "mongodb+srv://nucleus-auth-prd:<mypassword>@prd-default-pl-1.gzopt.mongodb.net/nucleus-auth-prd-default?authSource=admin&readPreference=secondary")

.option("database", "nucleus-auth-prd-default")

.option("collection", "userEntities")

.load()

)

Error:
(com.mongodb.MongoSecurityException) Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='nucleus-auth-prd', source='admin', password=<hidden>, mechanismProperties=<hidden>}

Can you help what could be the missing piece for me ?