Hi. I have a local MongoDB running on an EC2 instance in the same AWS VPC as my Databricks cluster but cannot get Databricks to talk to MongoDB.
I've followed the guide at https://docs.databricks.com/aws/en/connect/external-systems/mongodb and have also reviewed the MongoDB guidance at https://www.mongodb.com/docs/spark-connector/current/getting-started/ but to no avail.
I've attempted adding the MongoDB configuration to the cluster Spark configuration, and configuring locally within the Notebook.
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.read.connection.uri", "mongodb://x.x.x.x:27017/") \
.config("spark.mongodb.write.connection.uri", "mongodb://x.x.x.x:27017/") \
.getOrCreate()
database = "mydatabase"
collection = "mycollection"
df = my_spark.read.format("mongodb") \
.option("database", database) \
.option("collection", collection) \
.load()
However, on each run, I get the following error regardless of how I configure things:
(com.mongodb.MongoTimeoutException) Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
I've verified connectivity with the EC2 host that is running the MongoDB instance, but from the error, it looks like it is attempting to connect to localhost:27017, rather than the IP I've configured. Is this just a bogus error or am I missing something in the config?
I'm out of ideas so looking for some help/guidance. Thanks!