Re: Is Photon Acceleration Helpful for All Mainten...

szymon_dybczak · ‎09-01-2025

I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.
VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.

According to documentation, it is recommended to enable Photon for workloads with the following characteristics:

ETL pipelines consisting of Delta MERGE operations
Writing large volumes of data to cloud storage (Delta/Parquet)
Scans of large data sets, joins, aggregations and decimal computations
Auto Loader to incrementally and efficiently process new data arriving in storage
Interactive/ad hoc queries using SQL

Regarding advantages of Photon:

Accelerated queries that process a significant amount of data (> 100GB) and include aggregations and joins
Faster performance when data is accessed repeatedly from the Delta cache
More robust scan/read performance on tables with many columns and many small files
Faster Delta writing using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT
Join improvements

Comprehensive Guide to Optimize Data Workloads | Databricks

For instance, for VACUUM databricks recommends to use compute optimized instances. And since OPTIMIZE is also compute intensive I guess it also applies to it.