cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

When should I use STREAM() when defining a DLT table?

Mado
Valued Contributor II

Hi,

I am a little confused when I should use STREAM() when we define a table based on a DLT table.

There is a pattern explained in the documentation.

CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze
 
AS SELECT * FROM cloud_files(
 
  "s3://path/to/raw/data", "json"
 
)
 
 
 
CREATE OR REFRESH STREAMING LIVE TABLE streaming_silver
 
AS SELECT * FROM STREAM(LIVE.streaming_bronze) WHERE...
 
 
 
CREATE OR REFRESH LIVE TABLE live_gold
 
AS SELECT count(*) FROM LIVE.streaming_silver GROUP BY user_id

In the above code, "live_gold" is a live able. Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:

AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_id

However, STREAM() is not used in the definition of "live_gold".

I was wondering when I should use STREAM() when defining a live table.

1 ACCEPTED SOLUTION

Accepted Solutions

LandanG
Honored Contributor
Honored Contributor

@Mohammad Saber​ I think I see what you're saying. You only need to include the STREAM() function when declaring a STREAMING LIVE TABLE, it doesn't matter the source of data.

So since the live_gold table is just a LIVE TABLE and not a STREAMING LIVE TABLE you do not need to include that function. Hopefully, that makes sense

View solution in original post

3 REPLIES 3

LandanG
Honored Contributor
Honored Contributor

Hi @Mohammad Saber​ ,

You use STREAM() when creating a STREAMING LIVE TABLE and reading other tables in the pipeline. The key difference is the STREAMING LIVE TABLE vs. LIVE TABLE.

The definitions of each tables and when to use them are in the docs here.

Mado
Valued Contributor II

Thanks @Landan George​ 

Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:

AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_id

But, as you can see the "live_gold" is defined by:

AS SELECT count(*) FROM LIVE.streaming_silver GROUP BY user_id

LandanG
Honored Contributor
Honored Contributor

@Mohammad Saber​ I think I see what you're saying. You only need to include the STREAM() function when declaring a STREAMING LIVE TABLE, it doesn't matter the source of data.

So since the live_gold table is just a LIVE TABLE and not a STREAMING LIVE TABLE you do not need to include that function. Hopefully, that makes sense

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.