<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Lakehouse sync tables over rolling history in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/lakehouse-sync-tables-over-rolling-history/m-p/154727#M54128</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;we're exploring replacing one of the use cases we are running in our clour provider with a Databricks pipelines. We currently have explored possibility to subscribe to an eventhub using SDP pipelines, feedding our iot data into a Delta table where we keep full history.&lt;BR /&gt;This is impractical for usage in web applications for high latency. Our idea is to use Lakehouse synced table, however we don't want to sync with full history (too much data). Current idea is to have a dedicated delta table which would be filled by an SDP stream (to contain iot data near real-time), and then perhaps to be periodically cleaned to only contain last 7 days of data. That table would then be synced. There are all kinds of pitfalls&amp;nbsp; - combinining streaming ingestion with batch clean-up, syncing to Lakehose. Is this idea feasible? If not, how could these requirements be fullfiled?&lt;/P&gt;</description>
    <pubDate>Thu, 16 Apr 2026 11:33:44 GMT</pubDate>
    <dc:creator>leopold_cudzik</dc:creator>
    <dc:date>2026-04-16T11:33:44Z</dc:date>
    <item>
      <title>Lakehouse sync tables over rolling history</title>
      <link>https://community.databricks.com/t5/data-engineering/lakehouse-sync-tables-over-rolling-history/m-p/154727#M54128</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;we're exploring replacing one of the use cases we are running in our clour provider with a Databricks pipelines. We currently have explored possibility to subscribe to an eventhub using SDP pipelines, feedding our iot data into a Delta table where we keep full history.&lt;BR /&gt;This is impractical for usage in web applications for high latency. Our idea is to use Lakehouse synced table, however we don't want to sync with full history (too much data). Current idea is to have a dedicated delta table which would be filled by an SDP stream (to contain iot data near real-time), and then perhaps to be periodically cleaned to only contain last 7 days of data. That table would then be synced. There are all kinds of pitfalls&amp;nbsp; - combinining streaming ingestion with batch clean-up, syncing to Lakehose. Is this idea feasible? If not, how could these requirements be fullfiled?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2026 11:33:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakehouse-sync-tables-over-rolling-history/m-p/154727#M54128</guid>
      <dc:creator>leopold_cudzik</dc:creator>
      <dc:date>2026-04-16T11:33:44Z</dc:date>
    </item>
    <item>
      <title>Re: Lakehouse sync tables over rolling history</title>
      <link>https://community.databricks.com/t5/data-engineering/lakehouse-sync-tables-over-rolling-history/m-p/154764#M54139</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/217613"&gt;@leopold_cudzik&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The pattern you are suggesting is feasible, but it’s much easier to manage if you separate history ingestion from the 7-day serving view instead of cleaning the streaming sink table in place.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;A common architecture on Databricks would look like the below...&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;Bronze (full history, not synced):&lt;/STRONG&gt; Event Hub &amp;gt; SDP stream &amp;gt;&amp;nbsp;&lt;CODE class="p8i6j0f"&gt;bronze.iot_events_history&lt;/CODE&gt; (append-only Delta). This is your long-term history for analytics/compliance.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;STRONG&gt;Silver last 7 days (synced):&lt;/STRONG&gt; A second SDP pipeline (or streaming table) reads from bronze.iot_events_history and writes to silver.iot_events_last_7d, enforcing a 7-day window (via event time filter and/or watermarks). A simple scheduled job can periodically delete rows older than 7 days:&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;DELETE&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; silver.iot_events_last_7d
&lt;SPAN class="hljs-keyword"&gt;WHERE&lt;/SPAN&gt; event_time &lt;SPAN class="hljs-operator"&gt;&amp;lt;&lt;/SPAN&gt; &lt;SPAN class="hljs-built_in"&gt;current_timestamp&lt;/SPAN&gt;() &lt;SPAN class="hljs-operator"&gt;-&lt;/SPAN&gt; &lt;SPAN class="hljs-type"&gt;INTERVAL&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;7&lt;/SPAN&gt; DAYS;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="lqznwq0"&gt;&lt;STRONG style="color: #1b3139; font-family: inherit;"&gt;Synced table for the app:&lt;/STRONG&gt;&lt;SPAN&gt; Create a Lakehouse synced table from &lt;/SPAN&gt;&lt;CODE class="p8i6j0f"&gt;silver.iot_events_last_7d&lt;/CODE&gt;&lt;SPAN&gt; into Lakebase Postgres and point your web app to that Postgres table. When the 7-day cleanup runs on the Delta source, those deletions propagate through the sync, so the app only ever sees the rolling window.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="p8i6j01 paragraph"&gt;This avoids most pitfalls. The ingestion stream stays append-only and simple, the 7-day logic lives in a derived table you control, and the Lakehouse synced table only has the small, recent slice you need for low-latency web queries.&lt;/P&gt;
&lt;P&gt;Hopefully, the above answers your main question. However, if the primary system of record for the IoT data is operational (web apps) and analytics are secondary, flip the direction as below.&lt;/P&gt;
&lt;P&gt;Ingest IoT directly into Lakebase Postgres (via EventHub consumer, Data API, or app code).&amp;nbsp;Use Lakehouse Sync (Lakebase &amp;gt; Delta) to continuously replicate into Unity Catalog as lb_iot_events_history with full SCD2 history in the lakehouse.&amp;nbsp;Keep only 7 days of data in Lakebase (periodic deletes), while the lakehouse table keeps full history.&amp;nbsp;This pattern (subset in Lakebase, full history in the lakehouse) is an explicit Lakehouse Sync use case.&lt;/P&gt;
&lt;P&gt;This is best when Lakebase is your operational DB anyway and the lakehouse is downstream analytics.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2026 20:57:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakehouse-sync-tables-over-rolling-history/m-p/154764#M54139</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-16T20:57:57Z</dc:date>
    </item>
  </channel>
</rss>

