cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue with MongoDB Spark Connector in Databricks

vidya_kothavale
Contributor

 

I followed the official Databricks documentation("https://docs.databricks.com/en/_extras/notebooks/source/mongodb.html")

to integrate MongoDB Atlas with Spark by setting up the MongoDB Spark Connector and configuring the connection string in my Databricks cluster. However, I am encountering issues when trying to read data from MongoDB using Spark.

While I can successfully connect to MongoDB using the MongoClient in Python and execute queries like

from pymongo import MongoClient
client = MongoClient("connectionstring")
db = client["demo"]
collection = db["demo_collection"]
print(collection.find_one())

 I am unable to load data using the Spark connector with the following code:

df = spark.read.format("mongodb") \ .option("database", database) \ .option("spark.mongodb.input.uri", connectionString) \ .option("collection", "demo_collection") \ .load() df.printSchema()

The connection string is the same in both cases, and I have confirmed that the necessary permissions and IP whitelisting are correctly configured in MongoDB Atlas.

Despite this, no data is being retrieved when using Spark, and Iโ€™m unable to identify the issue.

also, I attached error screenshot below.

Can anyone provide guidance on potential configuration issues or additional steps needed to troubleshoot this problem with the MongoDB Spark connector in Databricks?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

 

View solution in original post

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

Could you try to change "spark.mongodb.input.uri" to following?

spark.read.format("mongodb").option("spark.mongodb.read.connection.uri"

 

Thanks! @szymon_dybczak  It's working perfectly now.

Hi vidya. I have the same problem.

I can connect using pymongo  and compass . I installed the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1 (latest one) on my cluster using the runtime 16.2 but I never was able to connect to same mongo cluster (sharded) using the primary as default.

This is the scala code (I've tested in python as well)

val connstr = "mongodb://user:xxxxxxx@cluster/dbxxx?tls=true&tlsInsecure=true&authSource=admin"

val df = spark.read.format("mongodb")
.option("database", "dbdbdbdbdb")
.option("spark.mongodb.read.connection.uri", connstr)
.option("collection", "cccccccccc")
.load().limit(5)
 
Also I can telnet the cluster successfully .
 
Any clues?

Hey @szymon_dybczak  @vidya_kothavale  facing similar issue not sure how to resolve.. 

cluster runtime : 16.4 LTS (includes Apache Spark 3.5.2, Scala 2.12)
mongodb spark connector : org.mongodb.spark:mongo-spark-connector_2.12:10.5.0

code: 

df = (spark.read
.format("mongodb")
.option("spark.mongodb.read.connection.uri", "mongodb+srv://nucleus-auth-prd:<mypassword>@prd-default-pl-1.gzopt.mongodb.net/nucleus-auth-prd-default?authSource=admin&readPreference=secondary")
.option("database", "nucleus-auth-prd-default")
.option("collection", "userEntities")
.load()
)

Error: 
(com.mongodb.MongoSecurityException) Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='nucleus-auth-prd', source='admin', password=<hidden>, mechanismProperties=<hidden>}

Can you help what could be the missing piece for me ?