cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

rijin-thomas
New Contributor II

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. 

Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can read from the db. I have a CA_File that I'm passing as an argument and it's stored in Unity Catalog. 

Code:

CONNECTION_URI = f"mongodb://{USERNAME}:{PASSWORD}@{ENDPOINT}:27017/{DATABASE_NAME}?replicaSet=rs0&readPreference=secondaryPreferred"

df = spark.read.format("mongodb") \
     .option("spark.mongodb.read.connection.uri", CONNECTION_URI) \
     .option("collection", COLLECTION) \
     .option("ssl", "true") \
     .option("ssl.CAFile", DBFS_CA_FILE) \
     .option("ssl.enabledProtocols", "TLSv1.2") \
     .load()

 Connector error:

SparkConnectGrpcException: (com.mongodb.MongoTimeoutException) Timed out after 30000 ms while waiting for a server that matches com.mongodb.client.internal.MongoClientDelegate. Client view of state is {type=REPLICA_SET, servers=[{address=<endpoint>, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}, caused by {java.net.SocketTimeoutException: Read timed out}}]
Rijin Thomas
4 REPLIES 4

bianca_unifeye
Contributor III

If PyMongo works but the Spark connector times out, the issue is almost always JVM TLS configuration or executor-level network access, not credentials or the database itself.

 

  • TLS handling (most common cause):
    The MongoDB Spark connector runs on the JVM and does not handle CA PEM files the same way as PyMongo. Use a JVM truststore (JKS or PKCS12) instead of ssl.CAFile, and configure it via JVM options for both driver and executors.

  • Executor connectivity:
    PyMongo usually tests connectivity from the driver only. Spark reads from executors, so confirm that all worker nodes can reach the DocumentDB endpoint on port 27017 (security groups, routes, DNS).

  • Enable TLS via URI:
    Set TLS explicitly in the connection string (e.g. tls=true) rather than relying on connector options.

  • DocumentDB compatibility:
    Add retryWrites=false to the connection string to align with Amazon DocumentDB limitations.

 

Hello @bianca_unifeye,

I was able to solve this issue by adding a JVM truststore. But it involved modifying the default Java cacert and appending the custom cert to the default cacert. I followed this KB article for How to import a custom CA certificate - Databricks. Thanks for the response!

Rijin Thomas

Glad I was able to help!

Sanjeeb2024
Contributor III

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) . 

Sanjeeb Mohapatra