Databricks Community

bigt23 · ‎10-17-2023

I just started to read `zstd` compressed file in Databricks on Azure, Runtime 14.1 on Spark 3.5.0

I've set PySpark commands as follows

path = f"wasbs://{container}@{storageaccount}.blob.core.windows.net/test-zstd"
schema = "some schema"
df = spark.read.option("compression", "zstd").json(path, schema)
df.createOrReplaceTempView("TestTable")

then hit sql

%sql
select * from TestTable limit 100

but failed with the following error.

Error in SQL statement: IllegalArgumentException: Codec [zstd] is not available. Known codecs are bzip2, deflate, uncompressed, lz4, gzip, snappy, none.

Is there anyway to read `zstd` files there?