Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

rijin-thomas
New Contributor II

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. 

Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can read from the db. I have a CA_File that I'm passing as an argument and it's stored in Unity Catalog. 

Code:

CONNECTION_URI = f"mongodb://{USERNAME}:{PASSWORD}@{ENDPOINT}:27017/{DATABASE_NAME}?replicaSet=rs0&readPreference=secondaryPreferred"

df = spark.read.format("mongodb") \
     .option("spark.mongodb.read.connection.uri", CONNECTION_URI) \
     .option("collection", COLLECTION) \
     .option("ssl", "true") \
     .option("ssl.CAFile", DBFS_CA_FILE) \
     .option("ssl.enabledProtocols", "TLSv1.2") \
     .load()

 Connector error:

SparkConnectGrpcException: (com.mongodb.MongoTimeoutException) Timed out after 30000 ms while waiting for a server that matches com.mongodb.client.internal.MongoClientDelegate. Client view of state is {type=REPLICA_SET, servers=[{address=<endpoint>, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}, caused by {java.net.SocketTimeoutException: Read timed out}}]
Rijin Thomas

bianca_unifeye
Databricks MVP

If PyMongo works but the Spark connector times out, the issue is almost always JVM TLS configuration or executor-level network access, not credentials or the database itself.

 

  • TLS handling (most common cause):
    The MongoDB Spark connector runs on the JVM and does not handle CA PEM files the same way as PyMongo. Use a JVM truststore (JKS or PKCS12) instead of ssl.CAFile, and configure it via JVM options for both driver and executors.

  • Executor connectivity:
    PyMongo usually tests connectivity from the driver only. Spark reads from executors, so confirm that all worker nodes can reach the DocumentDB endpoint on port 27017 (security groups, routes, DNS).

  • Enable TLS via URI:
    Set TLS explicitly in the connection string (e.g. tls=true) rather than relying on connector options.

  • DocumentDB compatibility:
    Add retryWrites=false to the connection string to align with Amazon DocumentDB limitations.

 

Hello @bianca_unifeye,

I was able to solve this issue by adding a JVM truststore. But it involved modifying the default Java cacert and appending the custom cert to the default cacert. I followed this KB article for How to import a custom CA certificate - Databricks. Thanks for the response!

Rijin Thomas

Glad I was able to help!

Sanjeeb2024
Valued Contributor

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) . 

Sanjeeb Mohapatra