cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Behavior of Zstd Compression for Delta Tables Across Different Databricks Runtime Versions

pooja_bhumandla
New Contributor III

Hi all,

For ZSTD compression, as per the documentation, any table created with DBR 16.0 or newer (or Apache Spark 3.5+) uses Zstd as the default compression codec instead of Snappy.

I explicitly set the table property to Zstd:

spark.sql("""
ALTER TABLE my_table
SET TBLPROPERTIES ('delta.compression.codec' = 'zstd')
""")

I also ran a full optimize on the table:

OPTIMIZE my_table FULL
After the optimization, the data files are indeed compressed using Zstd.

My question is about future writes:

if this table is later written to from a cluster running DBR 15.4 (or any runtime prior to 16.0), will the new output files still use Zstd (because of the table property) or will they revert to Snappy (because DBR <16.0)?
I’d appreciate any clarification or insights on how Delta handles compression across different runtimes.

Thanks!

2 REPLIES 2

JAHNAVI
Databricks Employee
Databricks Employee

@pooja_bhumandla 

New files written by DBR 15.4 (or any pre‑16.0 runtime) will still use Zstd as long as the table property delta.compression.codec = 'zstd' remains set on the table.

When we explicitly run: ALTER TABLE my_table
SET TBLPROPERTIES ('delta.compression.codec' = 'zstd');

Any runtime that understands this property will write new Parquet files in Zstd for that table, regardless of its own default compression


Jahnavi N

@JAHNAVI 

Thanks for the clarification.

Just to make sure I’m understanding this correctly for new table creation:
If a Delta table is created on DBR 15.4 with the compression property explicitly set, for example:

CREATE TABLE my_table (
...
)
USING DELTA
TBLPROPERTIES ('delta.compression.codec' = 'zstd');


Will the initial data files written during table creation use Zstd because of the table property, or does the DBR 15.4 runtime default (Snappy) still apply at creation time?
I’m specifically asking about the codec used for the data files created as part of the initial table creation.

Thanks again for your help.