<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Architecture choice, streaming data in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13787#M37</link>
    <description>&lt;P&gt;Hi @baatch us​&amp;nbsp;, this is a great question. Option 1 is very ideal if you require realtime processing of your data. Since you noted that you only need to process data when you need I would think that Option 2 is a better choice for you. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Option 1 would require 24/7 processing (i.e. 24/7 cluster) which is more costly than you need. Since you can do batch processing Option 2 would be more cost effective. Event hubs should allow you to dump directly into ADLS without an intermediate tool. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you ever do require stream processing it wouldn't be difficult to switch between your two options. &lt;/P&gt;</description>
    <pubDate>Mon, 11 Oct 2021 20:18:10 GMT</pubDate>
    <dc:creator>Ryan_Chynoweth</dc:creator>
    <dc:date>2021-10-11T20:18:10Z</dc:date>
    <item>
      <title>Architecture choice, streaming data</title>
      <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13785#M35</link>
      <description>&lt;P&gt;I have sensor data coming into Azure Event Hub and need some help in deciding how to best ingest it into the Data Lake and Delta Lake:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Option 1:&lt;/P&gt;
&lt;P&gt;azure event hub &amp;gt; databricks structured streaming &amp;gt; delta lake (bronze)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Option 2:&lt;/P&gt;
&lt;P&gt;azure event hub &amp;gt; event hub capture to Azure Data Lake gen 2 &amp;gt; Databricks Autoloader &amp;gt; delta lake(bronze)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No need for realtime only process when needed. Please state the reasons for choosing either option..&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Mar 2025 16:41:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13785#M35</guid>
      <dc:creator>baatchus</dc:creator>
      <dc:date>2025-03-20T16:41:08Z</dc:date>
    </item>
    <item>
      <title>Re: Architecture choice, streaming data</title>
      <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13787#M37</link>
      <description>&lt;P&gt;Hi @baatch us​&amp;nbsp;, this is a great question. Option 1 is very ideal if you require realtime processing of your data. Since you noted that you only need to process data when you need I would think that Option 2 is a better choice for you. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Option 1 would require 24/7 processing (i.e. 24/7 cluster) which is more costly than you need. Since you can do batch processing Option 2 would be more cost effective. Event hubs should allow you to dump directly into ADLS without an intermediate tool. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you ever do require stream processing it wouldn't be difficult to switch between your two options. &lt;/P&gt;</description>
      <pubDate>Mon, 11 Oct 2021 20:18:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13787#M37</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-10-11T20:18:10Z</dc:date>
    </item>
    <item>
      <title>Re: Architecture choice, streaming data</title>
      <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13788#M38</link>
      <description>&lt;P&gt;@Ryan Chynoweth​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks for the reply. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Option 1 can be configured as trigger once so both options can be regarded as batch. So will need to decide on what will be the best and most cost effective option? Also keep in mind Azure Event Hub only has 7 day retention if that matters in the architectural decision?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Option 1 (Trigger once, every 24 hour)&lt;/P&gt;&lt;P&gt;azure event hub &amp;gt; databricks structured streaming &amp;gt; delta lake (bronze)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Option 2 (Trigger once, every 24 hour)&lt;/P&gt;&lt;P&gt;azure event hub &amp;gt; event hub capture to Azure Data Lake gen 2 &amp;gt; Databricks Autoloader &amp;gt; delta lake(bronze)&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 09:16:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13788#M38</guid>
      <dc:creator>baatchus</dc:creator>
      <dc:date>2021-10-12T09:16:09Z</dc:date>
    </item>
    <item>
      <title>Re: Architecture choice, streaming data</title>
      <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13789#M39</link>
      <description>&lt;P&gt;If batch job is possible and you need to process data  I would use probably:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;azure event hub from (events after previous  job run) &amp;gt; databricks job process as dataframe &amp;gt; save df to delta lake&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;no streaming or capturing needed in that case&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 13:26:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13789#M39</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-10-12T13:26:51Z</dc:date>
    </item>
    <item>
      <title>Re: Architecture choice, streaming data</title>
      <link>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13790#M40</link>
      <description>&lt;P&gt;I do think that the 7 day retention should be considered. It may be a good idea to go with option 1 for your data pipeline and use the trigger once option. But I would also use the &lt;A href="https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-enable-through-portal" alt="https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-enable-through-portal" target="_blank"&gt;data capture capabilities&lt;/A&gt; in the event hub to archive all your data to a raw landing zone. &lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 16:30:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/architecture-choice-streaming-data/m-p/13790#M40</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-10-12T16:30:54Z</dc:date>
    </item>
  </channel>
</rss>

