cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Writing part files in single text file

Manthansingh
New Contributor

i want to write all my part file into a single text file is there anything i can do 

2 REPLIES 2

Witold
Honored Contributor

coalesce with one partition might be your friend:

(
  df
   .coalesce(1)
   .write.format('csv')
   .option('header', 'true')
   .save('one-file.csv')
)

Edthehead
Contributor II

When writing a pyspark dataframe to a file, it will always write to a part file by default. This is because of partitions, even if there is only 1 partitions.

To write into a single file you can convert the pyspark dataframe to a pandas dataframe and then write to target like so.

df.toPandas().to_csv(file_path, header = True, index = False)

You should be careful when dealing with very large files because when you convert to pandas, all the data from all nodes is brought to the driver so you can write to a single output. If you face OOM issues, you can try increasing the size of the driver node.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group