<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I write dataframe to s3 without partition column name on the path in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61127#M1203</link>
    <description>&lt;P&gt;I am currently trying to write a dataframe to s3 like&lt;/P&gt;&lt;P&gt;df.write&lt;BR /&gt;.partitionBy("col1","col2")&lt;BR /&gt;.mode(&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;)&lt;BR /&gt;.format(&lt;SPAN&gt;"json"&lt;/SPAN&gt;)&lt;BR /&gt;.save(&lt;SPAN&gt;"s3a://my_bucket/")&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;The path becomes `&lt;SPAN&gt;s3a://my_bucket/col1=abc/col2=opq/`&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But I want to path to be `s3a://my_bucket/abc/opq/`&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there a way to write the dataframe with a path removing the partition column name? &lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Feb 2024 14:28:51 GMT</pubDate>
    <dc:creator>Jennifer</dc:creator>
    <dc:date>2024-02-19T14:28:51Z</dc:date>
    <item>
      <title>How do I write dataframe to s3 without partition column name on the path</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61127#M1203</link>
      <description>&lt;P&gt;I am currently trying to write a dataframe to s3 like&lt;/P&gt;&lt;P&gt;df.write&lt;BR /&gt;.partitionBy("col1","col2")&lt;BR /&gt;.mode(&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;)&lt;BR /&gt;.format(&lt;SPAN&gt;"json"&lt;/SPAN&gt;)&lt;BR /&gt;.save(&lt;SPAN&gt;"s3a://my_bucket/")&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;The path becomes `&lt;SPAN&gt;s3a://my_bucket/col1=abc/col2=opq/`&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But I want to path to be `s3a://my_bucket/abc/opq/`&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Is there a way to write the dataframe with a path removing the partition column name? &lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2024 14:28:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61127#M1203</guid>
      <dc:creator>Jennifer</dc:creator>
      <dc:date>2024-02-19T14:28:51Z</dc:date>
    </item>
    <item>
      <title>Re: How do I write dataframe to s3 without partition column name on the path</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61135#M1205</link>
      <description>&lt;P&gt;Thanks for the quick reply. If I use the new column for partitioning, "combined_col" will still be in the path I think.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Feb 2024 14:59:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61135#M1205</guid>
      <dc:creator>Jennifer</dc:creator>
      <dc:date>2024-02-19T14:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: How do I write dataframe to s3 without partition column name on the path</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61426#M1206</link>
      <description>&lt;P&gt;The way I did at the end was to write files to dbfs first and then move them to s3 in order to have a customized path and file name. I could also avoid writing commit files to s3.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2024 09:28:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/61426#M1206</guid>
      <dc:creator>Jennifer</dc:creator>
      <dc:date>2024-02-22T09:28:57Z</dc:date>
    </item>
    <item>
      <title>Re: How do I write dataframe to s3 without partition column name on the path</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/64310#M1245</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/30638"&gt;@Jennifer&lt;/a&gt;&amp;nbsp;,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The default behavior of the&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;.partitionBy()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;function in Spark is to create a directory structure with partition column names. This is similar to Hive's partitioning scheme and is done for optimization purposes. Hence, you cannot directly change this behavior to remove partition column names from the path.&lt;/SPAN&gt;&lt;SPAN&gt;However, you can achieve your desired directory structure by doing a workaround. After saving the dataframe, you can rename the directories in your S3 bucket to remove the partition column names. This will have to be done outside of Spark, using AWS SDK or CLI.&lt;/SPAN&gt;&lt;SPAN&gt;Here is an example of how you can do it using AWS CLI:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE class="c-mrkdwn__pre" data-stringify-type="pre"&gt;bash&lt;BR /&gt;aws s3 mv &lt;A class="c-link" href="s3://my_bucket/col1=abc" target="_blank" rel="noopener noreferrer" data-stringify-link="s3://my_bucket/col1=abc" data-sk="tooltip_parent"&gt;s3://my_bucket/col1=abc&lt;/A&gt; &lt;A class="c-link" href="s3://my_bucket/abc" target="_blank" rel="noopener noreferrer" data-stringify-link="s3://my_bucket/abc" data-sk="tooltip_parent"&gt;s3://my_bucket/abc&lt;/A&gt; --recursive&lt;BR /&gt;aws s3 mv &lt;A class="c-link" href="s3://my_bucket/abc/col2=opq" target="_blank" rel="noopener noreferrer" data-stringify-link="s3://my_bucket/abc/col2=opq" data-sk="tooltip_parent"&gt;s3://my_bucket/abc/col2=opq&lt;/A&gt; &lt;A class="c-link" href="s3://my_bucket/abc/opq" target="_blank" rel="noopener noreferrer" data-stringify-link="s3://my_bucket/abc/opq" data-sk="tooltip_parent"&gt;s3://my_bucket/abc/opq&lt;/A&gt; --recursive&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Please note that this operation can be time-consuming if you have a large number of files or directories to rename.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Mar 2024 16:00:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/how-do-i-write-dataframe-to-s3-without-partition-column-name-on/m-p/64310#M1245</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2024-03-21T16:00:00Z</dc:date>
    </item>
  </channel>
</rss>

