With Predictive I/O for reads (GA) and updates (Public Preview), Databricks SQL can now analyze historical read and write patterns to intelligently build indexes and optimize DELETE, MERGE, and UPDATE operations.
What is Predictive I/O?
Predictive I/O is a collection of Databricks optimizations that improve performance for data interactions. Predictive I/O capabilities are grouped into the following categories:
- Accelerated reads reduce the time it takes to scan and read data.
- Accelerated updates reduce the amount of data that needs to be rewritten during updates, deletes, and merges.
Predictive I/O leverages deletion vectors to accelerate updates by reducing the frequency of full file rewrites during data modification on Delta tables. Predictive I/O optimizes Delete, MERGE, and UPDATE operations.
Rather than rewriting all records in a data file when any record is updated or deleted, predictive I/O uses deletion vectors to indicate records have been removed from the target data files. Supplemental data files are used to indicate updates.
How to get started:
1. Use serverless and pro types of SQL warehouses + Photon-accelerated clusters running Databricks Runtime 11.2 and above.
2. Enable support for deletion vectors on a Delta Lake table by setting a Delta Lake table property as shown following:
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. Click here to learn more.
Things to consider:
When you enable deletion vectors, the table protocol version is upgraded. Table protocol version upgrades are not reversible. After upgrading, the table will not be readable by Delta Lake clients that do not support deletion vectors. See How does Databricks manage Delta Lake feature compatibility?
Predictive I/O updates share all limitations with deletion vectors. In Databricks Runtime 12.1 and greater, the following limitations exist:
Resources: