bianca_unifeye
Databricks MVP

If PyMongo works but the Spark connector times out, the issue is almost always JVM TLS configuration or executor-level network access, not credentials or the database itself.

 

  • TLS handling (most common cause):
    The MongoDB Spark connector runs on the JVM and does not handle CA PEM files the same way as PyMongo. Use a JVM truststore (JKS or PKCS12) instead of ssl.CAFile, and configure it via JVM options for both driver and executors.

  • Executor connectivity:
    PyMongo usually tests connectivity from the driver only. Spark reads from executors, so confirm that all worker nodes can reach the DocumentDB endpoint on port 27017 (security groups, routes, DNS).

  • Enable TLS via URI:
    Set TLS explicitly in the connection string (e.g. tls=true) rather than relying on connector options.

  • DocumentDB compatibility:
    Add retryWrites=false to the connection string to align with Amazon DocumentDB limitations.