<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Streaming with Medalion Architchture and star schema Help in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108829#M43152</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/50463"&gt;@g96g&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've setup a near real-time (30-minute latency) streaming solution that ingests data from SQL Server into Delta Lake.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Changes in the source SQL Server tables are captured using Change Data Capture (CDC) and written to CSV files in a data lake.&lt;/LI&gt;&lt;LI&gt;A streaming process then reads these CSV files as they arrive and applies the changes to Delta tables.&lt;/LI&gt;&lt;LI&gt;The delta (changes) is determined by comparing timestamps against a logging table.&lt;/LI&gt;&lt;LI&gt;The entire process, from CDC extraction to streaming updates, is orchestrated using Databricks Workflows.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN&gt;I recommend DLT as the best solution for your use case, else use the below traditional approach.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Change Data Feed&lt;/LI&gt;&lt;LI&gt;Structured Streaming&lt;/LI&gt;&lt;LI&gt;3-minute micro-batches&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Feb 2025 17:21:11 GMT</pubDate>
    <dc:creator>MadhuB</dc:creator>
    <dc:date>2025-02-04T17:21:11Z</dc:date>
    <item>
      <title>Streaming with Medalion Architchture and star schema Help</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108711#M43126</link>
      <description>&lt;P&gt;What are the best practices for implementing non-stop streaming in a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Medallion Architecture&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;with a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Star Schema&lt;/STRONG&gt;?&lt;/P&gt;&lt;H1&gt;Use Case:&lt;/H1&gt;&lt;P&gt;We have&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;operational data&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and need to enable&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;near real-time reporting&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in Power BI, with a maximum latency of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;3 minutes&lt;/STRONG&gt;. No Delta live tables.&lt;/P&gt;&lt;H1&gt;Key Questions:&lt;/H1&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;How should we curate&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;dimensions and facts&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;when transitioning data from&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Silver to Gold&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Structured Streaming&lt;/STRONG&gt;?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Could you provide examples or proven approaches for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;fact-dimension joins&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in a streaming context?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;How can we use CDC in here?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;In case of more questions and clarification happy to answer your questions&lt;/P&gt;</description>
      <pubDate>Tue, 04 Feb 2025 07:53:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108711#M43126</guid>
      <dc:creator>g96g</dc:creator>
      <dc:date>2025-02-04T07:53:47Z</dc:date>
    </item>
    <item>
      <title>Re: Streaming with Medalion Architchture and star schema Help</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108803#M43148</link>
      <description>&lt;P&gt;Why not DLTs?&amp;nbsp; This is kind of an ideal use case.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Feb 2025 14:41:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108803#M43148</guid>
      <dc:creator>Rjdudley</dc:creator>
      <dc:date>2025-02-04T14:41:24Z</dc:date>
    </item>
    <item>
      <title>Re: Streaming with Medalion Architchture and star schema Help</title>
      <link>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108829#M43152</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/50463"&gt;@g96g&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've setup a near real-time (30-minute latency) streaming solution that ingests data from SQL Server into Delta Lake.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Changes in the source SQL Server tables are captured using Change Data Capture (CDC) and written to CSV files in a data lake.&lt;/LI&gt;&lt;LI&gt;A streaming process then reads these CSV files as they arrive and applies the changes to Delta tables.&lt;/LI&gt;&lt;LI&gt;The delta (changes) is determined by comparing timestamps against a logging table.&lt;/LI&gt;&lt;LI&gt;The entire process, from CDC extraction to streaming updates, is orchestrated using Databricks Workflows.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN&gt;I recommend DLT as the best solution for your use case, else use the below traditional approach.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Change Data Feed&lt;/LI&gt;&lt;LI&gt;Structured Streaming&lt;/LI&gt;&lt;LI&gt;3-minute micro-batches&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Feb 2025 17:21:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/streaming-with-medalion-architchture-and-star-schema-help/m-p/108829#M43152</guid>
      <dc:creator>MadhuB</dc:creator>
      <dc:date>2025-02-04T17:21:11Z</dc:date>
    </item>
  </channel>
</rss>

