cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

MongoDB Spark Connection Issues

Kirki
New Contributor II

Hi. I have a local MongoDB running on an EC2 instance in the same AWS VPC as my Databricks cluster but cannot get Databricks to talk to MongoDB. 

I've followed the guide at https://docs.databricks.com/aws/en/connect/external-systems/mongodb and have also reviewed the MongoDB guidance at https://www.mongodb.com/docs/spark-connector/current/getting-started/ but to no avail.

I've attempted adding the MongoDB configuration to the cluster Spark configuration, and configuring locally within the Notebook.

from pyspark.sql import SparkSession

my_spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.read.connection.uri", "mongodb://x.x.x.x:27017/") \
.config("spark.mongodb.write.connection.uri", "mongodb://x.x.x.x:27017/") \
.getOrCreate()

database = "mydatabase"
collection = "mycollection"
 
df = my_spark.read.format("mongodb") \
.option("database", database) \
.option("collection", collection) \
.load()
 
However, on each run, I get the following error regardless of how I configure things:

(com.mongodb.MongoTimeoutException) Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]

I've verified connectivity with the EC2 host that is running the MongoDB instance, but from the error, it looks like it is attempting to connect to localhost:27017, rather than the IP I've configured. Is this just a bogus error or am I missing something in the config?

I'm out of ideas so looking for some help/guidance. Thanks!

1 REPLY 1

Isi
Honored Contributor II

Hey @Kirki maybe its late but I will try to help you or others to create these connections

First thing make sure you have installed inside your cluster the connector 

org.mongodb.spark:mongo-spark-connector_2.12:3.0.1


You can use directly in your spark.read sentence like that

df = spark.read.format("mongodb") \
    .option("uri", "mongodb://<your-ec2-ip>:27017/") \
    .option("database", "mydatabase") \
    .option("collection", "mycollection") \
    .load()

or

df = spark.read.format("mongodb") \
    .option("spark.mongodb.input.uri", "mongodb://<your-ec2-ip>:27017/mydatabase.mycollection") \
    .load()

Set it in the Spark config of your cluster:

spark.mongodb.input.uri mongodb+srv://<user>:<pass>@<cluster>/
spark.mongodb.output.uri mongodb+srv://<user>:<pass>@<cluster>/

 

Try connecting directly from a Databricks notebook:

%sh
nc -zv <ec2-ip> 27017


At the cluster is an EC2 so:
Make sure MongoDB listens on 0.0.0.0 and your EC2 allows inbound connections on port 27017

Hope this helps, ๐Ÿ™‚

Isi