cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

mongodb spark

seefoods
Contributor III

Hello Guys,

Someone know a technique to perform write for a delta table to Mongo using connection mongodb org.mongodb.spark:mongo-spark-connector_2.12:10.5.0

I have 1 bilion records to write 

Thanks 


1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @seefoods ,

Yes, it's well described at mongo db connector documentation page. To write data to MongoDB you need call the write function on your DataFrame object. This function returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.


Here's an example of how to use it:

 

dataFrame = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

dataFrame.write.format("mongodb")
               .mode("append")
               .option("database", "people")
               .option("collection", "contacts")
               .save()

 

 

One thing to notice here - MongoDB Spark Connector supports the following save modes:

  • append

  • overwrite

So, in your case just read delta table to Dataframe and use DataFrameWriter object as described above.

Write to MongoDB in Batch Mode - Spark Connector - MongoDB Docs

 

 

Edit: This connector also support streaming mode. So this is something you also can consider if you want an easy way to load data incrementally from Delta Table to Mongo

Streaming Mode - Spark Connector - MongoDB Docs

View solution in original post

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Hi @seefoods ,

Yes, it's well described at mongo db connector documentation page. To write data to MongoDB you need call the write function on your DataFrame object. This function returns a DataFrameWriter object, which you can use to specify the format and other configuration settings for your batch write operation.


Here's an example of how to use it:

 

dataFrame = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

dataFrame.write.format("mongodb")
               .mode("append")
               .option("database", "people")
               .option("collection", "contacts")
               .save()

 

 

One thing to notice here - MongoDB Spark Connector supports the following save modes:

  • append

  • overwrite

So, in your case just read delta table to Dataframe and use DataFrameWriter object as described above.

Write to MongoDB in Batch Mode - Spark Connector - MongoDB Docs

 

 

Edit: This connector also support streaming mode. So this is something you also can consider if you want an easy way to load data incrementally from Delta Table to Mongo

Streaming Mode - Spark Connector - MongoDB Docs

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now