topic Re: Compression Export to volume is not working as expected in Data Engineering

Compression Export to volume is not working as expected

rakshakpr11 — Mon, 01 Dec 2025 17:06:39 GMT

I am trying to write data into a volume using below

table.coalesce(1)

.write

.mode("overwrite")

.format(file_format)

.option("header", "true")

.option("delimiter", field_delimiter)

.option("compression", "gzip")

.save(temp_path)
Command is running successfully but when I download the file I see for 1 file for each record of the table inside the zipped folder.

Note: Without compression, file is exported as expected.

Re: Compression Export to volume is not working as expected

bianca_unifeye — Mon, 01 Dec 2025 17:34:15 GMT

You’re not doing anything “wrong” in the write itself, this is mostly about how Spark writes files vs. how we download them from the UI.

As a workaround write without compression first, then compress.

Re: Compression Export to volume is not working as expected

iyashk-DB — Mon, 01 Dec 2025 18:50:27 GMT

It sounds like Spark is splitting your output into many small files (one per row) despite coalesce(1). Can you try setting spark.sql.files.maxRecordsPerFile , this limits how many records can be written into a single output file; if this is set to 1 (or any positive number), Spark will create a new file each time the limit is reached, regardless of partition count from coalesce()

(table.coalesce(1) .write .mode("overwrite") .format(file_format) # likely "csv" .option("header", "true") .option("delimiter", field_delimiter) .option("compression", "gzip") .option("maxRecordsPerFile", 0) # disable row-per-file split .save(temp_path))

But can you be more specific on the issue?

Re: Compression Export to volume is not working as expected

rakshakpr11 — Tue, 02 Dec 2025 09:15:28 GMT

Your understanding of my problem is correct.

I did try adding this option, still not working.

.option("maxRecordsPerFile", 0)

If I have to elaborate more I am trying to export the table to a volume as a single file with compression as gzip, but when gz compression is used I see 1 file per record of the table and file name is data from the table ex: col1data_col2data.

field_delimiter - "|" (but after export I see file name is separated as _) strange:(.
file format - csv.

Note: Without compression export is working as expect which is single file with all the records inside that.

looking forward for your reply :).