With Predictive I/O for reads (GA) and updates (Public Preview), Databricks SQL can now analyze historical read and write patterns to intelligently build indexes and optimize DELETE, MERGE, and UPDATE operations.
What is Predictive I/O?
Predictive I/O is a collection of Databricks optimizations that improve performance for data interactions. Predictive I/O capabilities are grouped into the following categories:
Accelerated reads reduce the time it takes to scan and read data.
Accelerated updates reduce the amount of data that needs to be rewritten during updates, deletes, and merges.
Predictive I/O leverages deletion vectors to accelerate updates by reducing the frequency of full file rewrites during data modification on Delta tables. Predictive I/O optimizes Delete, MERGE, and UPDATE operations.
Rather than rewriting all records in a data file when any record is updated or deleted, predictive I/O uses deletion vectors to indicate records have been removed from the target data files. Supplemental data files are used to indicate updates.
How to get started:
1. Use serverless and pro types of SQL warehouses + Photon-accelerated clusters running Databricks Runtime 11.2 and above.
2. Enable support for deletion vectors on a Delta Lake table by setting a Delta Lake table property as shown following:
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. Click here to learn more.