Hi everyone,
I'm looking for advice from anyone who has implemented near real-time ingestion from Amazon DocumentDB into Databricks.
Our current architecture is:
Application → Amazon DocumentDB
Python AWS Lambda functions capture changes from DocumentDB
Lambda continuously writes the data into Amazon Redshift
Redshift is then used as our data warehouse
This setup has been working well for us.
We're now evaluating Databricks as our analytics platform, but I'm not finding a straightforward way to stream data directly from DocumentDB into Databricks. I've heard that Databricks doesn't have a native connector or CDC support for Amazon DocumentDB.
My questions are:
Has anyone successfully implemented near real-time or real-time ingestion from Amazon DocumentDB into Databricks?
What architecture are you using?
I'm interested in production-proven architectures rather than proof-of-concept examples.
Thanks in advance!