<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I'm curious if anyone has ever written a file to S3 with a custom file name? in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37319#M803</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/83800"&gt;@dsugs&lt;/a&gt;&amp;nbsp;thanks for posting here.&lt;/P&gt;&lt;P&gt;You need to use repartition(1) to write the single partition file into s3, then you have to move the single file by giving your file name in the destination_path.&lt;BR /&gt;You can use the below snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;output_df.repartition(1).write.format(file_format).mode(write_mode).option("header","true").option("inferSchema", "true").save(output_path)

fname = [y.name for y in dbutils.fs.ls(output_path) if y.name.startswith("part-")]
dbutils.fs.mv(output_path + "/" + fname[0],f"{output_path}.parquet")
dbutils.fs.rm(output_path)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# This code first gets a list of all the files in the output_path directory that&lt;/SPAN&gt; &lt;SPAN class=""&gt;# start with "part-". This is because Spark writes parquet files to the output_path&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# directory in partitions, and we only want to move the first partition.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# The next line moves the first partition to a new file named output_path.parquet.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# Finally, the code deletes the output_path directory.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 10 Jul 2023 14:48:44 GMT</pubDate>
    <dc:creator>Hemant</dc:creator>
    <dc:date>2023-07-10T14:48:44Z</dc:date>
    <item>
      <title>I'm curious if anyone has ever written a file to S3 with a custom file name?</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/36010#M774</link>
      <description>&lt;P&gt;So I've been trying to write a file to S3 bucket giving it a custom name, everything I try just ends up with the file being dumped into a folder with the specified name so the output is like ".../file_name/part-001.parquet". instead I want the file to show up as "/file_name.parquet".&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2023 23:04:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/36010#M774</guid>
      <dc:creator>dsugs</dc:creator>
      <dc:date>2023-06-28T23:04:13Z</dc:date>
    </item>
    <item>
      <title>Re: I'm curious if anyone has ever written a file to S3 with a custom file name?</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37308#M802</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/83800"&gt;@dsugs&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;This cannot be done directly. We only have access to provide the directory name. A part file is basically one among many files that are going to be under this data directory. So, if you are going to name it as file_name.parquet, then you have to name the second file as file_name2.parquet and so on. It is usually suggested not to modify the file names under the data directory. But if you still insist to do so, you can do a file level copy using dbutils.fs.cp() command and rename each file uniquely in a different location.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jul 2023 13:21:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37308#M802</guid>
      <dc:creator>Tharun-Kumar</dc:creator>
      <dc:date>2023-07-10T13:21:40Z</dc:date>
    </item>
    <item>
      <title>Re: I'm curious if anyone has ever written a file to S3 with a custom file name?</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37319#M803</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/83800"&gt;@dsugs&lt;/a&gt;&amp;nbsp;thanks for posting here.&lt;/P&gt;&lt;P&gt;You need to use repartition(1) to write the single partition file into s3, then you have to move the single file by giving your file name in the destination_path.&lt;BR /&gt;You can use the below snippet:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;output_df.repartition(1).write.format(file_format).mode(write_mode).option("header","true").option("inferSchema", "true").save(output_path)

fname = [y.name for y in dbutils.fs.ls(output_path) if y.name.startswith("part-")]
dbutils.fs.mv(output_path + "/" + fname[0],f"{output_path}.parquet")
dbutils.fs.rm(output_path)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# This code first gets a list of all the files in the output_path directory that&lt;/SPAN&gt; &lt;SPAN class=""&gt;# start with "part-". This is because Spark writes parquet files to the output_path&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# directory in partitions, and we only want to move the first partition.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# The next line moves the first partition to a new file named output_path.parquet.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;# Finally, the code deletes the output_path directory.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jul 2023 14:48:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37319#M803</guid>
      <dc:creator>Hemant</dc:creator>
      <dc:date>2023-07-10T14:48:44Z</dc:date>
    </item>
    <item>
      <title>Re: I'm curious if anyone has ever written a file to S3 with a custom file name?</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37493#M810</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/83800"&gt;@dsugs&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Cheers!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 10:06:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37493#M810</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-07-12T10:06:08Z</dc:date>
    </item>
    <item>
      <title>Re: I'm curious if anyone has ever written a file to S3 with a custom file name?</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37529#M813</link>
      <description>&lt;P&gt;Spark feature where to avoid network io it writes each shuffle partition as a 'part...' file on disk and each file as you said will have compression and efficient encoding by default.&lt;/P&gt;&lt;P&gt;So Yes it is directly related to parallel processing !!&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 21:08:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/i-m-curious-if-anyone-has-ever-written-a-file-to-s3-with-a/m-p/37529#M813</guid>
      <dc:creator>rdkarthikeyan27</dc:creator>
      <dc:date>2023-07-12T21:08:36Z</dc:date>
    </item>
  </channel>
</rss>

