- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2025 06:24 AM
Hello Guys,
Someone know a technique to perform write for a delta table to Mongo using connection mongodb org.mongodb.spark:mongo-spark-connector_2.12:10.5.0
I have 1 bilion records to write
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2025 06:36 AM - edited 09-18-2025 06:37 AM
Hi @seefoods ,
Yes, it's well described at mongo db connector documentation page. To write data to MongoDB you need call the write function on your DataFrame object. This function returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.
Here's an example of how to use it:
dataFrame = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
dataFrame.write.format("mongodb")
.mode("append")
.option("database", "people")
.option("collection", "contacts")
.save()
One thing to notice here - MongoDB Spark Connector supports the following save modes:
append
overwrite
So, in your case just read delta table to Dataframe and use DataFrameWriter object as described above.
Write to MongoDB in Batch Mode - Spark Connector - MongoDB Docs
Edit: This connector also support streaming mode. So this is something you also can consider if you want an easy way to load data incrementally from Delta Table to Mongo