cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to write only file on to the Blob or ADLS from Databricks?

Simha
New Contributor II

Hi All,

I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.

I want only file to be written. 

Can anyone help me to fix this issue.

Thanks in advance.

1 REPLY 1

Lakshay
Databricks Employee
Databricks Employee

Hi @Simha , This is expected behavior. Spark always creates an output directory when writing the data and it divides the result into multiple part files. This is because multiple executors write the result into the output directory. We cannot make the spark write the file without creating the output directory.

But we can control the no. of part files that are written in output directory by using the coalesce function. To get a single file output, you can use coalesce(1) while doing the write operation. However, I would advise you to decide the coalesce partition carefully as coalesce(1) would bring all the data to single executor and if the data volume is huge, this can lead to executor going OOM. 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now