cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach?

AustinBen
Visitor

Hi everyone,

I'm looking for advice from anyone who has implemented near real-time ingestion from Amazon DocumentDB into Databricks.

Our current architecture is:

  • Application → Amazon DocumentDB

  • Python AWS Lambda functions capture changes from DocumentDB

  • Lambda continuously writes the data into Amazon Redshift

  • Redshift is then used as our data warehouse

This setup has been working well for us.

We're now evaluating Databricks as our analytics platform, but I'm not finding a straightforward way to stream data directly from DocumentDB into Databricks. I've heard that Databricks doesn't have a native connector or CDC support for Amazon DocumentDB.

My questions are:

  1. Has anyone successfully implemented near real-time or real-time ingestion from Amazon DocumentDB into Databricks?

  2. What architecture are you using?

I'm interested in production-proven architectures rather than proof-of-concept examples.

Thanks in advance!

0 REPLIES 0