cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What happens to table created with CTAS statement when data in source table has changed

DatBoi
Contributor

Hey all - I am sure this has been documented / answered before but what happens to a table created with a CTAS statement when data in the source table has changed? Does the sink table reflect the changes? Or is the data stored when the table is defined and handled separately from the source table? The latter is my understanding.

What if you wanted the sink table to be updated with the source table? What would be the best approach? Would you have to manually write a pipeline (probably using the CDF)? Or is there some easy spark configuration for this?

What about views? My understanding is that views are just stored logic. So every time you query a view that logic is executed. Does that mean views will always have the most recent data from the source tables?

I know this is probably documented somewhere so could someone link me towards that? I cannot find it.

Any and all information would be appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

SergeRielau
Databricks Employee
Databricks Employee

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.
The new table retains no memory on how it came to be.
Therefore it will be oblivious to changes in the source.

Views, as you say, stored queries, no data is persisted. And therefore the query is run everytime the view is referenced. A view can "break" if the query gets sufficiently out of sync with the column list of the view - which is fixed at CREATE time.

Materialized views combine CTAS with VIEWs. They act like views in tracking updates to the data source content and they persist the data to sped you references.
However they do not tolerate changes to the schema of the data source (beyond ADD COLUMN).

To have the object "roll with the punches" so to speak, have a look at structured streaming and Delta Live Tables.



 

View solution in original post

2 REPLIES 2

SergeRielau
Databricks Employee
Databricks Employee

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.
The new table retains no memory on how it came to be.
Therefore it will be oblivious to changes in the source.

Views, as you say, stored queries, no data is persisted. And therefore the query is run everytime the view is referenced. A view can "break" if the query gets sufficiently out of sync with the column list of the view - which is fixed at CREATE time.

Materialized views combine CTAS with VIEWs. They act like views in tracking updates to the data source content and they persist the data to sped you references.
However they do not tolerate changes to the schema of the data source (beyond ADD COLUMN).

To have the object "roll with the punches" so to speak, have a look at structured streaming and Delta Live Tables.



 

Okay cool - thanks for the response.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group