- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 09:58 AM - edited 08-27-2025 10:07 AM
@seefoods , you can use Databricks to perform synchronization in both directions with the Spark MongoDB connector. This connector supports both streaming and batch modes.
The most straightforward approach would be to create a pipeline that, once a day, reads from MongoDB and writes the data to a Delta table.
Then, you would need to create a similar pipeline in the opposite direction, where once a day the pipeline reads from the Delta table and writes the data back to MongoDB.
But if you want, you can take a more ambitious approach. Since the connector supports streaming, you could set up a job that once in a day reads the changes applied to your MongoDB database.
Similarly, you can enable Change Data Feed (CDF) on your Delta table and use streaming to read only the changes applied there, then write those incremental updates back to your MongoDB collection.
For further reading - batch mode:
https://www.mongodb.com/docs/spark-connector/current/batch-mode/batch-read
https://www.mongodb.com/docs/spark-connector/current/batch-mode/batch-write
For further reading - streaming mode:
https://www.mongodb.com/docs/spark-connector/current/batch-mode/batch-write
MongoDB ChangeStream & Spark Delta Table : An Alliance | by Rajesh Vinayagam | Medium