<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/streaming-amazon-documentdb-to-databricks-in-near-real-time-what/m-p/160692#M54938</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm looking for advice from anyone who has implemented near real-time ingestion from&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Amazon DocumentDB&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;into&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Databricks&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Our current architecture is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Application → Amazon DocumentDB&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Python AWS Lambda functions capture changes from DocumentDB&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Lambda continuously writes the data into Amazon Redshift&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Redshift is then used as our data warehouse&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This setup has been working well for us.&lt;/P&gt;&lt;P&gt;We're now evaluating&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Databricks&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as our analytics platform, but I'm not finding a straightforward way to stream data directly from DocumentDB into Databricks. I've heard that Databricks doesn't have a native connector or CDC support for Amazon DocumentDB.&lt;/P&gt;&lt;P&gt;My questions are:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Has anyone successfully implemented near real-time or real-time ingestion from Amazon DocumentDB into Databricks?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;What architecture are you using?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I'm interested in production-proven architectures rather than proof-of-concept examples.&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Fri, 26 Jun 2026 15:44:56 GMT</pubDate>
    <dc:creator>AustinBen</dc:creator>
    <dc:date>2026-06-26T15:44:56Z</dc:date>
    <item>
      <title>Streaming Amazon DocumentDB to Databricks in near real time - what's the best approach?</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-amazon-documentdb-to-databricks-in-near-real-time-what/m-p/160692#M54938</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm looking for advice from anyone who has implemented near real-time ingestion from&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Amazon DocumentDB&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;into&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Databricks&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Our current architecture is:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Application → Amazon DocumentDB&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Python AWS Lambda functions capture changes from DocumentDB&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Lambda continuously writes the data into Amazon Redshift&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Redshift is then used as our data warehouse&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;This setup has been working well for us.&lt;/P&gt;&lt;P&gt;We're now evaluating&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Databricks&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as our analytics platform, but I'm not finding a straightforward way to stream data directly from DocumentDB into Databricks. I've heard that Databricks doesn't have a native connector or CDC support for Amazon DocumentDB.&lt;/P&gt;&lt;P&gt;My questions are:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Has anyone successfully implemented near real-time or real-time ingestion from Amazon DocumentDB into Databricks?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;What architecture are you using?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;I'm interested in production-proven architectures rather than proof-of-concept examples.&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jun 2026 15:44:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-amazon-documentdb-to-databricks-in-near-real-time-what/m-p/160692#M54938</guid>
      <dc:creator>AustinBen</dc:creator>
      <dc:date>2026-06-26T15:44:56Z</dc:date>
    </item>
  </channel>
</rss>

