Hello, I have been tasked to optimize some of our existing tables in delta lake in Databricks and I was able to perform following clause on some of our delta tables but I wasn't able to execute the same clause against some of our streaming tables.
ALTER TABLE table_name
CLUSTER BY (column1, column2);
Error that I receive when I execute above statement for our streaming tables:
[STREAMING_TABLE_OPERATION_NOT_ALLOWED.UNSUPPORTED_OPERATION] The operation ALTER TABLE is not allowed: The operation is not supported on Streaming Tables. SQLSTATE: 42601
My goal is to apply liquid clustering against all tables but seems like, we may have to build tables from scratch to apply liquid clustering on streaming table, is that right?As an alternative, I thought I should apply z-ordering against streaming table since following SQL, I was able to apply. Now thinking, what's the right way to apply z-ordering? Can I just continue executing following SQL every day or shall I apply the spark config params in my code?
OPTIMIZE table_name ZORDER BY column1, column2;
Or
pipelines.autoOptimize.managed = true
pipelines.autoOptimize.zOrderCols = "column1,column2"
source: https://docs.databricks.com/en/delta-live-tables/properties.html
Appreciate the feedback and guide for best path forward to apply best possible optimization against my delta table to limit the file scans.