cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik_L
by Contributor II
  • 1825 Views
  • 3 replies
  • 4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

  • 1825 Views
  • 3 replies
  • 4 kudos
Latest Reply
Erik_L
Contributor II
  • 4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

  • 4 kudos
2 More Replies
User16301467532
by New Contributor II
  • 16372 Views
  • 9 replies
  • 1 kudos

How can I change the parquet compression algorithm from gzip to something else?

Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.

  • 16372 Views
  • 9 replies
  • 1 kudos
Latest Reply
ZhenZeng
New Contributor II
  • 1 kudos

spark.sql("set spark.sql.parquet.compression.codec=gzip");

  • 1 kudos
8 More Replies
Labels