cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Clarity on usage STREAM while defining DLT tables

lokeshr
New Contributor

Hi, I am currently trying to learn Databricks and going through tutorials and learning materials. I came across this link https://databricks.com/discover/pages/getting-started-with-delta-live-tables

While I get most of what is described in page, I find it hard to understand why while building silver tier one of the bronze tables, sales_orders_raw, is mentioned with keyword STREAM other bronze table,customers, is just using marker LIVE. Shouldn't both be marked with STREAM as well as LIVE. Is this some typo?

Regards,

Lokesh

2 REPLIES 2

tomasz
Contributor

This is because in the example "sales_orders" data is being streamed, joined (using left join) to customers, and being appended to the silver layer table. When a sales_order comes in from a customer that was inserted some time ago (rather than in the current micro-batch being processed) the entire customer table has to be loaded to find that customer id and name. Therefore using LIVE.customers without "STREAMING" allows the join to be a stream-batch join (as described here).

Essentially because you only need the most recent records coming in from "sales_orders" you can use the "STREAM" keyword but the join requires the entire customer table to be loaded and hence the lack of the "STREAM" keyword there.

On the other side of the coin, you need to update the silver layer table only when a new sales_order comes in, not when a new customer is streamed into the bronze layer. That's another reason why you only need the STREAM on the sales_order table.

jose_gonzalez
Moderator
Moderator

Hi @Lokesh Raju​,

Just a friendly follow-up. Did Tomasz's response help you to resolved your question? If it did, please mark it as best.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group