Hi all,
For ZSTD compression, as per the documentation, any table created with DBR 16.0 or newer (or Apache Spark 3.5+) uses Zstd as the default compression codec instead of Snappy.
I explicitly set the table property to Zstd:
spark.sql("""
ALTER TABLE my_table
SET TBLPROPERTIES ('delta.compression.codec' = 'zstd')
""")
I also ran a full optimize on the table:
OPTIMIZE my_table FULL
After the optimization, the data files are indeed compressed using Zstd.
My question is about future writes:
if this table is later written to from a cluster running DBR 15.4 (or any runtime prior to 16.0), will the new output files still use Zstd (because of the table property) or will they revert to Snappy (because DBR <16.0)?
I’d appreciate any clarification or insights on how Delta handles compression across different runtimes.
Thanks!