<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT refresh time for combination of streaming and non streaming tables? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112913#M9254</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/126180"&gt;@surajitDE&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;When using both streaming and batch data, the pipeline may not always refresh every 5 seconds. While the streaming table (fact_stream) updates every 5 seconds, the batch table (dim_table) fully reloads each time, adding overhead from repeatedly loading the batch data.&lt;/P&gt;
&lt;P&gt;The actual refresh time depends on the size of dim_table, larger tables take longer to reload, which can delay updates.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Mar 2025 11:15:16 GMT</pubDate>
    <dc:creator>Advika</dc:creator>
    <dc:date>2025-03-18T11:15:16Z</dc:date>
    <item>
      <title>DLT refresh time for combination of streaming and non streaming tables?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112769#M9252</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;joined_table&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; dim_df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"dim_table"&lt;/SPAN&gt;&lt;SPAN&gt;) &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;# Reloads every batch&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; fact_df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.readStream.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"fact_stream"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; fact_df.&lt;/SPAN&gt;&lt;SPAN&gt;join&lt;/SPAN&gt;&lt;SPAN&gt;(dim_df, &lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"left"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 17 Mar 2025 06:29:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112769#M9252</guid>
      <dc:creator>surajitDE</dc:creator>
      <dc:date>2025-03-17T06:29:26Z</dc:date>
    </item>
    <item>
      <title>Re: DLT refresh time for combination of streaming and non streaming tables?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112771#M9253</link>
      <description>&lt;P&gt;the question is default DLT pipeline refresh time is 5seconds but if I use combination of streaming and non streaming data then will it still be 5 seconds?&lt;/P&gt;</description>
      <pubDate>Mon, 17 Mar 2025 06:44:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112771#M9253</guid>
      <dc:creator>surajitDE</dc:creator>
      <dc:date>2025-03-17T06:44:18Z</dc:date>
    </item>
    <item>
      <title>Re: DLT refresh time for combination of streaming and non streaming tables?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112913#M9254</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/126180"&gt;@surajitDE&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;When using both streaming and batch data, the pipeline may not always refresh every 5 seconds. While the streaming table (fact_stream) updates every 5 seconds, the batch table (dim_table) fully reloads each time, adding overhead from repeatedly loading the batch data.&lt;/P&gt;
&lt;P&gt;The actual refresh time depends on the size of dim_table, larger tables take longer to reload, which can delay updates.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 11:15:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/112913#M9254</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-03-18T11:15:16Z</dc:date>
    </item>
    <item>
      <title>Re: DLT refresh time for combination of streaming and non streaming tables?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/113259#M9255</link>
      <description>&lt;P&gt;In a Delta Live Tables (DLT) continuous pipeline, does it make a difference if df_dim_prev (loaded in cell 1) is only read once at the start?&lt;/P&gt;&lt;P&gt;For example, if df_dim_prev is initialized as:&lt;/P&gt;&lt;P&gt;# Cell 1: Read dim_table once&lt;/P&gt;&lt;P&gt;df_dim_prev = spark.read.table("dim_table")&lt;/P&gt;&lt;P&gt;Then used in a streaming join inside a DLT table:&lt;/P&gt;&lt;P&gt;# Cell 2: DLT table with a streaming source&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table def joined_table():&lt;/P&gt;&lt;P&gt;dim_df = df_dim_prev&lt;/P&gt;&lt;P&gt;# Using the preloaded dimension table&lt;/P&gt;&lt;P&gt;fact_df = spark.readStream.table("fact_stream")&lt;/P&gt;&lt;P&gt;return fact_df.join(dim_df, "id", "left")&lt;/P&gt;&lt;P&gt;Would this mean that dim_df remains static until the entire pipeline is refreshed, rather than updating dynamically as dim_table changes?&lt;/P&gt;&lt;P&gt;is there a better way to handle this if we want dim_table to update periodically in a continuous pipeline?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Mar 2025 07:06:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/113259#M9255</guid>
      <dc:creator>surajitDE</dc:creator>
      <dc:date>2025-03-21T07:06:00Z</dc:date>
    </item>
    <item>
      <title>Re: DLT refresh time for combination of streaming and non streaming tables?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/113353#M9256</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Current approach reloads dim_df in every batch, which can be inefficient. To optimize, consider broadcasting dim_df if it's small or using a mapGroupsWithState function for stateful joins. Also, ensure that fact_df has sufficient watermarking to handle late data efficiently. Let me know if you need further optimization suggestions!&lt;/P&gt;&lt;P&gt;Regards&lt;A href="https://themagistvapk.net/" target="_self"&gt;,&lt;/A&gt;&lt;BR /&gt;Bryce June&lt;/P&gt;</description>
      <pubDate>Sat, 22 Mar 2025 11:03:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/dlt-refresh-time-for-combination-of-streaming-and-non-streaming/m-p/113353#M9256</guid>
      <dc:creator>brycejune</dc:creator>
      <dc:date>2025-03-22T11:03:15Z</dc:date>
    </item>
  </channel>
</rss>

