02-27-2025 08:10 AM - edited 02-27-2025 08:16 AM
Hi.
I'm testing a databricks connection to a mongo cluster V7 (azure cluster) using the library org.mongodb.spark:mongo-spark-connector_2.13:10.4.1
I can connect using compass but I get a timeout error using my adb notebook
MongoTimeoutException: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]
By the way I telnet the server with success (%sh telnet.....)
Any ideas?
02-28-2025 06:32 AM
any help?
03-04-2025 06:39 AM
Hi. Not a solution I'm afraid, but I'm having the exact same issue. Did you manage to resolve at all?
What is throwing me is that I'm configuring the IP for the MongoDB instance as its running in AWS on an EC2 instance, but I still see 'localhost' on the error message as if it's ignoring my configuration. Is this similar to what you're seeing?
03-04-2025 06:53 AM - edited 03-04-2025 06:55 AM
Hi.
Yes . Same. I see localhost.
My cluster is deployed on azure kubernetes. I can connect using pymongo and also compass.
I've tested using a free atlas cluster and worked as well (I changed the atlas firewall rule to enabled my databricks workspace)
No clues
14m ago
The error you’re seeing — MongoTimeoutException referencing localhost:27017 — suggests your Databricks cluster is trying to connect to MongoDB using the wrong address or that it cannot properly reach the MongoDB cluster endpoint from the notebook, even though telnet works from a shell command.
Wrong Host in Connection String:
The error log shows localhost:27017, which is almost always incorrect when connecting from Databricks to a cloud MongoDB cluster. The connection string in your Spark configuration or notebook is likely defaulting to localhost, which refers to the Databricks node, not your MongoDB cluster. Compass might connect because it’s running from your local machine, where you’ve specified the correct MongoDB URI.
Network Connectivity:
Telnet confirms the network route, but Spark jobs run on worker nodes, which might have different networking rules. Also, running %sh telnet uses the driver, not the Spark executors, so it is not a definitive test for all nodes in the cluster.
Firewall/Security Groups:
Even if telnet works from the driver, your MongoDB Atlas or Azure firewall may be blocking traffic from Databricks worker pools. Double-check your IP allowlist or VNet/NSG rules for MongoDB.
Make sure your Spark configuration uses the full MongoDB URI, not localhost. Example Spark config (in a cell):
spark.conf.set("spark.mongodb.read.connection.uri", "mongodb+srv://<user>:<password>@<cluster-host>/test?retryWrites=true&w=majority")
spark.conf.set("spark.mongodb.write.connection.uri", "mongodb+srv://<user>:<password>@<cluster-host>/test?retryWrites=true&w=majority")
Replace localhost:27017 with your actual cluster host, username, and password.
Try connecting with pymongo (if available):
from pymongo import MongoClient
client = MongoClient("mongodb+srv://<user>:<password>@<cluster-host>")
print(client.server_info())
If this fails, the problem is at the network/firewall level or improper authentication.
Telnet from %sh only checks connectivity from the driver node.
For Spark clusters, networking must allow all worker nodes to reach the database. Workers spun up by Databricks may or may not share the same outbound IP as the driver node.
Ensure you have installed the correct mongo-spark-connector library on your cluster via the Libraries tab.
Confirm all Spark jobs use the correct connector version (10.4.1 for MongoDB 7.x is supported).
Double-check any environmental variables or secret scopes for credentials.
Using mongodb+srv:// is recommended for Atlas or DNS-enabled clusters.
If you use private endpoints or VNet integration, ensure Databricks has proper routing/subnet permissions.
If you are using auth sources or custom databases, add those parameters to your URI.
df = spark.read.format("mongodb").option("uri", "mongodb+srv://<user>:<password>@<cluster-host>/<database>.<collection>").load()
df.show()
Summary:
Your issue is likely due to an incorrect URI (defaulting to localhost:27017) or network/firewall restrictions unique to the Databricks execution environment, not your laptop. Double-check your connection string in the notebook, test with a standalone Python client, and make sure all nodes have the necessary network permissions to reach MongoDB.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now