Solved: Re: Incremental write - Databricks Community - 14562

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hi All,

I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source tables , joins them and eventually overwrites the existing parquet file.

The question becomes - is there a way to implement the incremental write only in cases of a new record or changes in the values in the existing record of the file.

1 ACCEPTED SOLUTION

Accepted Solutions

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

View solution in original post

3 REPLIES 3

Thanks, Appreciate the quick response.

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

Thanks werners

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions