szymon_dybczak
Esteemed Contributor III

Hi @Sainath368 ,

I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.
VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.

According to documentation, it is recommended to enable Photon  for workloads with the following characteristics:

  • ETL pipelines consisting of Delta MERGE operations
  • Writing large volumes of data to cloud storage (Delta/Parquet)
  • Scans of large data sets, joins, aggregations and decimal computations
  • Auto Loader to incrementally and efficiently process new data arriving in storage
  • Interactive/ad hoc queries using SQL

Regarding advantages of Photon:

  • Accelerated queries that process a significant amount of data (> 100GB) and include aggregations and joins
  • Faster performance when data is accessed repeatedly from the Delta cache
  • More robust scan/read performance on tables with many columns and many small files
  • Faster Delta writing using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT
  • Join improvements

Comprehensive Guide to Optimize Data Workloads | Databricks

 


For instance, for VACUUM databricks recommends to use compute optimized instances. And since OPTIMIZE is also compute intensive I guess it also applies to it.

szymon_dybczak_0-1756727080230.png