<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Compression Export to volume is not working as expected in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140772#M51530</link>
    <description>&lt;P&gt;It sounds like Spark is splitting your output into many small files (one per row) despite coalesce(1). Can you try setting spark.sql.files.maxRecordsPerFile , this limits how many records can be written into a single output file; if this is set to 1 (or any positive number), Spark will create a new file each time the limit is reached, regardless of partition count from coalesce()&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;(table.coalesce(1)
      .write
      .mode("overwrite")
      .format(file_format)            # likely "csv"
      .option("header", "true")
      .option("delimiter", field_delimiter)
      .option("compression", "gzip")
      .option("maxRecordsPerFile", 0) # disable row-per-file split
      .save(temp_path))&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;But can you be more specific on the issue?&lt;/P&gt;</description>
    <pubDate>Mon, 01 Dec 2025 18:50:27 GMT</pubDate>
    <dc:creator>iyashk-DB</dc:creator>
    <dc:date>2025-12-01T18:50:27Z</dc:date>
    <item>
      <title>Compression Export to volume is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140767#M51528</link>
      <description>&lt;P&gt;I am trying to write data into a volume using below&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;table.coalesce(1)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .write&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .mode("overwrite")&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .format(file_format)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .option("header", "true")&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .option("delimiter", field_delimiter)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .option("compression", "gzip")&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .save(temp_path)&lt;BR /&gt;Command is running successfully but when I download the file I see for 1 file for each record of the table inside the zipped folder.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="rakshakpr11_0-1764608677946.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22029iD8EBD0E7A2B37072/image-size/medium?v=v2&amp;amp;px=400" role="button" title="rakshakpr11_0-1764608677946.png" alt="rakshakpr11_0-1764608677946.png" /&gt;&lt;/span&gt;&lt;P&gt;Note: Without compression, file is exported as expected.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 01 Dec 2025 17:06:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140767#M51528</guid>
      <dc:creator>rakshakpr11</dc:creator>
      <dc:date>2025-12-01T17:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: Compression Export to volume is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140768#M51529</link>
      <description>&lt;P&gt;You’re not doing anything “wrong” in the write itself, this is mostly about how Spark writes files vs. how we download them from the UI.&lt;/P&gt;&lt;P&gt;As a workaround write without compression first, then compress.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Dec 2025 17:34:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140768#M51529</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2025-12-01T17:34:15Z</dc:date>
    </item>
    <item>
      <title>Re: Compression Export to volume is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140772#M51530</link>
      <description>&lt;P&gt;It sounds like Spark is splitting your output into many small files (one per row) despite coalesce(1). Can you try setting spark.sql.files.maxRecordsPerFile , this limits how many records can be written into a single output file; if this is set to 1 (or any positive number), Spark will create a new file each time the limit is reached, regardless of partition count from coalesce()&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;(table.coalesce(1)
      .write
      .mode("overwrite")
      .format(file_format)            # likely "csv"
      .option("header", "true")
      .option("delimiter", field_delimiter)
      .option("compression", "gzip")
      .option("maxRecordsPerFile", 0) # disable row-per-file split
      .save(temp_path))&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;But can you be more specific on the issue?&lt;/P&gt;</description>
      <pubDate>Mon, 01 Dec 2025 18:50:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140772#M51530</guid>
      <dc:creator>iyashk-DB</dc:creator>
      <dc:date>2025-12-01T18:50:27Z</dc:date>
    </item>
    <item>
      <title>Re: Compression Export to volume is not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140847#M51545</link>
      <description>&lt;P&gt;Your understanding of my problem is correct.&lt;BR /&gt;&lt;BR /&gt;I did try adding this option, still not working.&lt;/P&gt;&lt;PRE&gt;.option("maxRecordsPerFile", 0)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;If I have to elaborate more I am trying to export the table to a volume as a single file with compression as gzip, but when gz compression is used I see 1 file per record of the table and file name is data from the table ex: col1data_col2data.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;field_delimiter - "|" (but after export I see file name is separated as _) strange:(.&lt;BR /&gt;file format&amp;nbsp; - csv.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="rakshakpr11_0-1764666743152.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22044i5C6BA363158C54EC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="rakshakpr11_0-1764666743152.png" alt="rakshakpr11_0-1764666743152.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Note:&amp;nbsp; Without compression export is working as expect which is single file with all the records inside that.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;looking forward for your reply :).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Dec 2025 09:15:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/compression-export-to-volume-is-not-working-as-expected/m-p/140847#M51545</guid>
      <dc:creator>rakshakpr11</dc:creator>
      <dc:date>2025-12-02T09:15:28Z</dc:date>
    </item>
  </channel>
</rss>

