Hi @Faisal ,
The maintenance cluster in Delta Live Tables (DLT) is responsible for automatically running operations like OPTIMIZE
, ZORDER
, and VACUUM
on your Delta tables to maintain optimal performance and manage storage costs.
When you specify a maintenance cluster for a pipeline, DLT automatically identifies the tables that need to be optimized, z-ordered or vacuumed by tracking the table usage across all the pipelines consuming them. These operations are then performed by the maintenance cluster during pre-configured maintenance windows.
For optimization and Z-ordering, DLT uses data statistics to identify the columns that would benefit most from these operations. It calculates statistics on the data columns based on the usage information of the tables andWhen you set up a maintenance cluster for Delta Lake automatic optimization, the cluster will perform three maintenance tasks: vacuum, optimize, and z-order.
-
VACUUM
: The maintenance cluster will automatically start VACUUM on your Delta tables every 7 days. VACUUM reclaims the unused storage space occupied by stale data files generated when performing UPDATE and DELETE operations in Delta tables.
-
OPTIMIZE
: After running VACUUM, the maintenance cluster will also execute OPTIMIZE to merge smaller files into larger files, which reduces the overall number of files in the Delta table, minimizes overhead, and improves table read performance.
-
Z-ORDER
: The maintenance cluster can also leverage Z-Ordering to efficiently query large Delta tables. Z-Ordering reorders the data based on the specified columns, which can speed up queries that filter on those columns.
To instruct the maintenance cluster about the columns to Z-Order, you can use Databricks Delta's USE ZORDER statement, and run it on your Delta table. Z-ordering columns should be those that you commonly use in common queries, joins for instance.
Maintenance cluster configuration for DLT is quite simple and can be done from the web UI of Databricks. You just need to create a cluster for maintenance purposes, and then configure your Delta Live Tables pipeline to use that cluster specifically.
You can also configure settings such as the frequency of the VACUUM operation and how much data is retained before being vacuumed by using the Delta auto-optimize settings.
So when you create a maintenance cluster, you can specify the frequency and settings for automatic OPTIMIZE and VACUUM operations. However, specifying the Z-Order columns is something that must be done manually using the "USE ZORDER" statement as mentioned above.