Data Engineering

by Erik_L • Contributor II

01-31-2023 5:31:49 PM

6929 Views
4 replies
4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

Data Engineering

6929 Views
4 replies
4 kudos

01-31-2023 5:31:49 PM

View Replies

Latest Reply

Erik_L
Contributor II

02-01-2023 1:48:21 PM

4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

4 kudos

02-01-2023 1:48:21 PM

3 More Replies

by User16301467532 • Databricks Employee

07-15-2015 11:45:24 AM

26226 Views
9 replies
1 kudos

How can I change the parquet compression algorithm from gzip to something else?

Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.

Data Engineering

26226 Views
9 replies
1 kudos

07-15-2015 11:45:24 AM

View Replies

Latest Reply

ZhenZeng
New Contributor II

10-01-2019 2:10:05 AM

1 kudos

spark.sql("set spark.sql.parquet.compression.codec=gzip");

1 kudos

10-01-2019 2:10:05 AM

8 More Replies

Databricks Community

Forum Posts

Resolved! Support for Parquet brotli compression or a work around

How can I change the parquet compression algorithm from gzip to something else?