cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to load all the previous day's data only into the newly added column of the existing delta table

shan-databricks
New Contributor III

How to load all the previous day's data only into the newly added column of the existing delta table? Is there any option available to do that without writing any logic?

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @shan-databricks ,

There is no out of the box mechanism to do that. You need to implement your own custom logic to handle that scenario.

There are some thing in delta protocol that can make your life easier like schema evolution. 

Schema evolution allows you to resolve schema mismatches between the target and source table in merge. It handles the following two cases:

  1. A column exists in the source table but not the target table.

    That column will be added to the target schema, and its values will be populated from the corresponding column in the source.

    • This only applies when the column name and structure in the merge source exactly match the target assignment.

    • The new column must be directly assigned from the source, without expressions, renaming, or transformations.

  2. A column exists in the target table but not the source table.

    The target schema is not changed. These columns:

    • Are left unchanged for UPDATE SET *.

    • Are set to NULL for INSERT *.

    • May still be explicitly modified if assigned in the action clause.

 

But still you have to write logic that will update older records, because they will contain NULL values.

BS_THE_ANALYST
Esteemed Contributor

@shan-databricks there's certainly ways for the schema to evolve within your delta tables that's supported out of the box: https://docs.databricks.com/aws/en/delta/update-schema#enable-schema-evolution 

To update older records, they'd likely have NULL values in the new column. You could use an UPDATE statement as a one-off to populate them. You could also look into a MERGE statement that is very flexible https://docs.databricks.com/aws/en/sql/language-manual/delta-merge-into#when-matched . This part requires you to write some logic. If you're not confident, you could always use AI + the Databricks Free Edition to practice on in your own environment before implementing it on the real thing.

All the best,
BS

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now