<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Clarity on usage STREAM while defining DLT tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11546#M6494</link>
    <description>&lt;P&gt;Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link &lt;I&gt;&lt;U&gt;&lt;A href="https://databricks.com/discover/pages/getting-started-with-delta-live-tables" alt="https://databricks.com/discover/pages/getting-started-with-delta-live-tables" target="_blank"&gt;https://databricks.com/discover/pages/getting-started-with-delta-live-tables&lt;/A&gt;&lt;/U&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;While I get most of what is described in page, I find it hard to understand why while building silver tier one of the bronze tables, sales_orders_raw, is mentioned with keyword STREAM other bronze table,customers, is just using marker LIVE.  Shouldn't both be marked with STREAM as well as LIVE. Is this some typo?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Lokesh&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 02 Aug 2022 16:51:52 GMT</pubDate>
    <dc:creator>lokeshr</dc:creator>
    <dc:date>2022-08-02T16:51:52Z</dc:date>
    <item>
      <title>Clarity on usage STREAM while defining DLT tables</title>
      <link>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11546#M6494</link>
      <description>&lt;P&gt;Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link &lt;I&gt;&lt;U&gt;&lt;A href="https://databricks.com/discover/pages/getting-started-with-delta-live-tables" alt="https://databricks.com/discover/pages/getting-started-with-delta-live-tables" target="_blank"&gt;https://databricks.com/discover/pages/getting-started-with-delta-live-tables&lt;/A&gt;&lt;/U&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;While I get most of what is described in page, I find it hard to understand why while building silver tier one of the bronze tables, sales_orders_raw, is mentioned with keyword STREAM other bronze table,customers, is just using marker LIVE.  Shouldn't both be marked with STREAM as well as LIVE. Is this some typo?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Lokesh&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2022 16:51:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11546#M6494</guid>
      <dc:creator>lokeshr</dc:creator>
      <dc:date>2022-08-02T16:51:52Z</dc:date>
    </item>
    <item>
      <title>Re: Clarity on usage STREAM while defining DLT tables</title>
      <link>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11547#M6495</link>
      <description>&lt;P&gt;This is because in the example "sales_orders" data is being streamed, joined (using left join) to customers, and being appended to the silver layer table. When a sales_order comes in from a customer that was inserted some time ago (rather than in the current &lt;A href="https://databricks.com/blog/2018/03/20/low-latency-continuous-processing-mode-in-structured-streaming-in-apache-spark-2-3-0.html" alt="https://databricks.com/blog/2018/03/20/low-latency-continuous-processing-mode-in-structured-streaming-in-apache-spark-2-3-0.html" target="_blank"&gt;micro-batch&lt;/A&gt; being processed) the entire customer table has to be loaded to find that customer id and name. Therefore using LIVE.customers without "STREAMING" allows the join to be a stream-batch join (as described &lt;A href="https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-incremental-data.html#stream-batch-joins" alt="https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-incremental-data.html#stream-batch-joins" target="_blank"&gt;here&lt;/A&gt;). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Essentially because you only need the most recent records coming in from "sales_orders" you can use the "STREAM" keyword but the join requires the entire customer table to be loaded and hence the lack of the "STREAM" keyword there.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On the other side of the coin, you need to update the silver layer table only when a new sales_order comes in, not when a new customer is streamed into the bronze layer. That's another reason why you only need the STREAM on the sales_order table.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2022 18:57:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11547#M6495</guid>
      <dc:creator>tomasz</dc:creator>
      <dc:date>2022-08-03T18:57:14Z</dc:date>
    </item>
    <item>
      <title>Re: Clarity on usage STREAM while defining DLT tables</title>
      <link>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11548#M6496</link>
      <description>&lt;P&gt;Hi @Lokesh Raju​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best. &lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 17:18:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/clarity-on-usage-stream-while-defining-dlt-tables/m-p/11548#M6496</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-08-30T17:18:59Z</dc:date>
    </item>
  </channel>
</rss>

