cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Erik_L
by Contributor II
  • 4101 Views
  • 4 replies
  • 4 kudos

Resolved! Support for Parquet brotli compression or a work around

Spark 3.3.1 supports the brotli compression codec, but when I use it to read parquet files from S3, I get:INVALID_ARGUMENT: Unsupported codec for Parquet page: BROTLIExample code:df = (spark.read.format("parquet") .option("compression", "brotli")...

  • 4101 Views
  • 4 replies
  • 4 kudos
Latest Reply
Erik_L
Contributor II
  • 4 kudos

Given the new information I appended, I looked into the Delta caching and I can disable it:.option("spark.databricks.io.cache.enabled", False)This works as a work around while I read these files in to save them locally in DBFS, but does it have perfo...

  • 4 kudos
3 More Replies
User16301467532
by New Contributor II
  • 21582 Views
  • 9 replies
  • 1 kudos

How can I change the parquet compression algorithm from gzip to something else?

Spark, by default, uses gzip to store parquet files. I would like to change the compression algorithm from gzip to snappy or lz4.

  • 21582 Views
  • 9 replies
  • 1 kudos
Latest Reply
ZhenZeng
New Contributor II
  • 1 kudos

spark.sql("set spark.sql.parquet.compression.codec=gzip");

  • 1 kudos
8 More Replies
Labels