Solved: Incremental write - Databricks Community - 14562

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hi All,

I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source tables , joins them and eventually overwrites the existing parquet file.

The question becomes - is there a way to implement the incremental write only in cases of a new record or changes in the values in the existing record of the file.

1 ACCEPTED SOLUTION

Accepted Solutions

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

View solution in original post

3 REPLIES 3

Thanks, Appreciate the quick response.

the MERGE functionality of delta lake is what you are looking for.

https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.html

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge...

Thanks werners

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐