cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Will consecutive delete insert affect z-ordering?

shubhadip
New Contributor

Let's say there is a delta table with a date field as its partition. In a table where condition, we delete all the rows according to the division. The data is currently being inserted into the same date field. If we do a z-order after inserting the data, are earlier files also contributing to the z-order process? Because deleting the rows will affect the logs, but the actual file will still exist.

1 REPLY 1

Anonymous
Not applicable

@Shubhadip Ghosh​ :

In Delta Lake, when you perform a delete operation on a table, it doesn't physically remove the data from the files. Instead, it marks the affected rows for deletion by adding a tombstone marker to the Delta transaction log. This ensures that the data remains available for readers and maintains consistency during concurrent operations.

When you subsequently perform an insert operation on the same table, it creates new files containing the inserted data. The Z-Ordering optimization in Delta Lake helps to improve query performance by physically organizing data within these files based on the specified column(s).

Now, to answer your question: No, the earlier files that have been marked for deletion through the delete operation will not contribute to the Z-Ordering process for the newly inserted data. Z-Ordering operates on the data present in the active files, which are the ones that have not been marked for deletion.

The Delta transaction log keeps track of the logical changes made to the table, including the delete operation, and applies them during query execution. However, the deleted rows do not affect the Z-Ordering optimization for subsequent inserts because they are stored separately as tombstones in the transaction log, rather than being physically removed from the existing files.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group