<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Simply writing a dataframe to a CSV file (non-partitioned) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27822#M19670</link>
    <description>&lt;P&gt;Thanks for confirming that that's the only way &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;   &lt;/P&gt;</description>
    <pubDate>Thu, 17 Feb 2022 07:28:41 GMT</pubDate>
    <dc:creator>Bilal1</dc:creator>
    <dc:date>2022-02-17T07:28:41Z</dc:date>
    <item>
      <title>Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27818#M19666</link>
      <description>&lt;P&gt;When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there any way I can simply write my data to a CSV file, with the name I specified, and have that single file in the folder I specified ?&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 06:37:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27818#M19666</guid>
      <dc:creator>Bilal1</dc:creator>
      <dc:date>2022-02-17T06:37:25Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27819#M19667</link>
      <description>&lt;P&gt;yes, but you have to do a coalesce(1).  This will generate a single csv file, however you will also lose some parallelism as this coalesce(1) is propagated upstream.&lt;/P&gt;&lt;P&gt;Also do not forget to disable the writing of _SUCCESS etc files (see &lt;A href="https://community.databricks.com/s/feed/0D53f00001hXcI3CAK" alt="https://community.databricks.com/s/feed/0D53f00001hXcI3CAK" target="_blank"&gt;this topic&lt;/A&gt;)&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 07:13:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27819#M19667</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-17T07:13:58Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27820#M19668</link>
      <description>&lt;P&gt;Thanks Werners. however it still writes to a folder, and I still need to rename the file, and copy it out etc. &lt;/P&gt;&lt;P&gt;I would like test1.csv to be a file in the root folder.  Not a folder.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2083i64D2E773309D5848/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 07:20:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27820#M19668</guid>
      <dc:creator>Bilal1</dc:creator>
      <dc:date>2022-02-17T07:20:08Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27821#M19669</link>
      <description>&lt;P&gt;it will always write to a folder due to the parallel nature of spark.&lt;/P&gt;&lt;P&gt;If that is an issue, you can use magic command %sh to move the .csv file a level up and also rename it.&lt;/P&gt;&lt;P&gt;So use the 'mv' command.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 07:26:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27821#M19669</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-02-17T07:26:43Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27822#M19670</link>
      <description>&lt;P&gt;Thanks for confirming that that's the only way &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;   &lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 07:28:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27822#M19670</guid>
      <dc:creator>Bilal1</dc:creator>
      <dc:date>2022-02-17T07:28:41Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27823#M19671</link>
      <description>&lt;P&gt;The csv file will have random name, can you show me how you will rename it without going into hassel of copying its name?&lt;/P&gt;&lt;P&gt;For example lets say name of root folder is Main, inside main i wrote csv using coalsce(1) and the structure is Main/data.csv/RandomBigName-part-00000xyz.csv&lt;/P&gt;&lt;P&gt;Now i want to move csv file inside Main folder and lets say name it as dummyData.csv... So final structure which i want is Main/dummyData.csv&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 08:56:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/27823#M19671</guid>
      <dc:creator>krutarth</dc:creator>
      <dc:date>2022-07-08T08:56:36Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/56808#M30656</link>
      <description>&lt;P&gt;Could you please provide an example of using %sh or mv to move and rename the csv?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2024 17:41:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/56808#M30656</guid>
      <dc:creator>Nw2this</dc:creator>
      <dc:date>2024-01-09T17:41:56Z</dc:date>
    </item>
    <item>
      <title>Re: Simply writing a dataframe to a CSV file (non-partitioned)</title>
      <link>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/92785#M38541</link>
      <description>&lt;P&gt;I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;// Set the temporary output directory and the desired final file path&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;tempDir&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"/tmp/your_file_name"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;finalOutputPath&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"/tmp/your_file_name.csv"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;// Get a DataFrame that contains the relevant CSV file data&lt;/DIV&gt;&lt;DIV&gt;val df =&amp;nbsp;&lt;SPAN&gt;spark.table(&lt;/SPAN&gt;&lt;SPAN&gt;"your_table_name"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;// Write DataFrame to a single partition in the temporary directory&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df.coalesce(&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; .write&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; .mode(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; .csv(tempDir)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;// List the files in the temporary directory to find the CSV file&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;val&lt;/SPAN&gt; &lt;SPAN&gt;csvFile&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; dbutils.fs.ls(tempDir).filter(file &lt;/SPAN&gt;&lt;SPAN&gt;=&amp;gt;&lt;/SPAN&gt;&lt;SPAN&gt; file.name.endsWith(&lt;/SPAN&gt;&lt;SPAN&gt;".csv"&lt;/SPAN&gt;&lt;SPAN&gt;))(&lt;/SPAN&gt;&lt;SPAN&gt;0&lt;/SPAN&gt;&lt;SPAN&gt;).path&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;// Move and rename the CSV file to the desired location&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dbutils.fs.mv(csvFile, finalOutputPath)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;// Remove the temporary directory&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dbutils.fs.rm(tempDir, &lt;/SPAN&gt;&lt;SPAN&gt;true&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 04 Oct 2024 17:41:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simply-writing-a-dataframe-to-a-csv-file-non-partitioned/m-p/92785#M38541</guid>
      <dc:creator>chris0706</dc:creator>
      <dc:date>2024-10-04T17:41:14Z</dc:date>
    </item>
  </channel>
</rss>

