cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What happens to table created with CTAS statement when data in source table has changed

DatBoi
Contributor

Hey all - I am sure this has been documented / answered before but what happens to a table created with a CTAS statement when data in the source table has changed? Does the sink table reflect the changes? Or is the data stored when the table is defined and handled separately from the source table? The latter is my understanding.

What if you wanted the sink table to be updated with the source table? What would be the best approach? Would you have to manually write a pipeline (probably using the CDF)? Or is there some easy spark configuration for this?

What about views? My understanding is that views are just stored logic. So every time you query a view that logic is executed. Does that mean views will always have the most recent data from the source tables?

I know this is probably documented somewhere so could someone link me towards that? I cannot find it.

Any and all information would be appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

SergeRielau
Valued Contributor
Valued Contributor

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.
The new table retains no memory on how it came to be.
Therefore it will be oblivious to changes in the source.

Views, as you say, stored queries, no data is persisted. And therefore the query is run everytime the view is referenced. A view can "break" if the query gets sufficiently out of sync with the column list of the view - which is fixed at CREATE time.

Materialized views combine CTAS with VIEWs. They act like views in tracking updates to the data source content and they persist the data to sped you references.
However they do not tolerate changes to the schema of the data source (beyond ADD COLUMN).

To have the object "roll with the punches" so to speak, have a look at structured streaming and Delta Live Tables.



 

View solution in original post

2 REPLIES 2

SergeRielau
Valued Contributor
Valued Contributor

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.
The new table retains no memory on how it came to be.
Therefore it will be oblivious to changes in the source.

Views, as you say, stored queries, no data is persisted. And therefore the query is run everytime the view is referenced. A view can "break" if the query gets sufficiently out of sync with the column list of the view - which is fixed at CREATE time.

Materialized views combine CTAS with VIEWs. They act like views in tracking updates to the data source content and they persist the data to sped you references.
However they do not tolerate changes to the schema of the data source (beyond ADD COLUMN).

To have the object "roll with the punches" so to speak, have a look at structured streaming and Delta Live Tables.



 

Okay cool - thanks for the response.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.