topic When should I use STREAM() when defining a DLT table? in Data Engineering

When should I use STREAM() when defining a DLT table?

Mado — Tue, 15 Nov 2022 11:17:36 GMT

Hi,

I am a little confused when I should use STREAM() when we define a table based on a DLT table.

There is a pattern explained in the documentation.

CREATE OR REFRESH STREAMING LIVE TABLE streaming_bronze
 
AS SELECT * FROM cloud_files(
 
  "s3://path/to/raw/data", "json"
 
)
 
 
 
CREATE OR REFRESH STREAMING LIVE TABLE streaming_silver
 
AS SELECT * FROM STREAM(LIVE.streaming_bronze) WHERE...
 
 
 
CREATE OR REFRESH LIVE TABLE live_gold
 
AS SELECT count(*) FROM LIVE.streaming_silver GROUP BY user_id

In the above code, "live_gold" is a live able. Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:

AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_id

However, STREAM() is not used in the definition of "live_gold".

I was wondering when I should use STREAM() when defining a live table.

Re: When should I use STREAM() when defining a DLT table?

LandanG — Tue, 15 Nov 2022 20:12:43 GMT

Hi @Mohammad Saber ,

You use STREAM() when creating a STREAMING LIVE TABLE and reading other tables in the pipeline. The key difference is the STREAMING LIVE TABLE vs. LIVE TABLE.

The definitions of each tables and when to use them are in the docs here.

Re: When should I use STREAM() when defining a DLT table?

Mado — Wed, 16 Nov 2022 00:14:55 GMT

Thanks @Landan George

Since "streaming_silver" is a streaming live table, I expected the last line of the code to be:

AS SELECT count(*) FROM STREAM(LIVE.streaming_silver) GROUP BY user_id

But, as you can see the "live_gold" is defined by:

AS SELECT count(*) FROM LIVE.streaming_silver GROUP BY user_id

Re: When should I use STREAM() when defining a DLT table?

LandanG — Wed, 16 Nov 2022 00:51:30 GMT

@Mohammad Saber I think I see what you're saying. You only need to include the STREAM() function when declaring a STREAMING LIVE TABLE, it doesn't matter the source of data.

So since the live_gold table is just a LIVE TABLE and not a STREAMING LIVE TABLE you do not need to include that function. Hopefully, that makes sense