<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to create a single CSV file with specified file name Spark in Databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-single-csv-file-with-specified-file-name-spark/m-p/83060#M36831</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I know how to use Spark in Databricks to create a CSV, but it always has lots of side effects.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For example, here is my code:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;file_path = “dbfs:/mnt/target_folder/file.csv”&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").csv(file_path, header=True)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Then what I got is&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;A folder with name file.csv&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;In the folder there are files called `_committed_xxxx`, “_started_xxxx”, “_SUCCESS”&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Multiple files with `part-xxxx`&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;What I want is only a &lt;STRONG&gt;SINGLE CSV file&lt;/STRONG&gt; name with the name `file.csv`, how can I achieve this?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I tried to use pandas.to_csv function, but it’s not working on Databricks notebook, the error is “OSError: Cannot save file into a non-existent directory”&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Aug 2024 08:04:26 GMT</pubDate>
    <dc:creator>guangyi</dc:creator>
    <dc:date>2024-08-15T08:04:26Z</dc:date>
    <item>
      <title>How to create a single CSV file with specified file name Spark in Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-single-csv-file-with-specified-file-name-spark/m-p/83060#M36831</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I know how to use Spark in Databricks to create a CSV, but it always has lots of side effects.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For example, here is my code:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;file_path = “dbfs:/mnt/target_folder/file.csv”&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").csv(file_path, header=True)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Then what I got is&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;A folder with name file.csv&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;In the folder there are files called `_committed_xxxx`, “_started_xxxx”, “_SUCCESS”&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Multiple files with `part-xxxx`&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;What I want is only a &lt;STRONG&gt;SINGLE CSV file&lt;/STRONG&gt; name with the name `file.csv`, how can I achieve this?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I tried to use pandas.to_csv function, but it’s not working on Databricks notebook, the error is “OSError: Cannot save file into a non-existent directory”&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2024 08:04:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-single-csv-file-with-specified-file-name-spark/m-p/83060#M36831</guid>
      <dc:creator>guangyi</dc:creator>
      <dc:date>2024-08-15T08:04:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to create a single CSV file with specified file name Spark in Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-a-single-csv-file-with-specified-file-name-spark/m-p/83078#M36839</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/109070"&gt;@guangyi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;To disable _commited_xxx, _started_xxx and _SUCCSSS you must set below spark options:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark.conf.set("spark.databricks.io.directoryCommit.createSuccessFile","false") 
spark.conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And if you want to have single csv file, you need to use coalsece before write operation:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;coalesce(&lt;SPAN class=""&gt;1&lt;/SPAN&gt;).&lt;SPAN class=""&gt;write&lt;/SPAN&gt;.mode(&lt;SPAN class=""&gt;"overwrite"&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2024 11:07:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-a-single-csv-file-with-specified-file-name-spark/m-p/83078#M36839</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-08-15T11:07:24Z</dc:date>
    </item>
  </channel>
</rss>

