cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to write only file on to the Blob or ADLS from Databricks?

Simha
New Contributor II

Hi All,

I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.

I want only file to be written. 

Can anyone help me to fix this issue.

Thanks in advance.

1 REPLY 1

Lakshay
Databricks Employee
Databricks Employee

Hi @Simha , This is expected behavior. Spark always creates an output directory when writing the data and it divides the result into multiple part files. This is because multiple executors write the result into the output directory. We cannot make the spark write the file without creating the output directory.

But we can control the no. of part files that are written in output directory by using the coalesce function. To get a single file output, you can use coalesce(1) while doing the write operation. However, I would advise you to decide the coalesce partition carefully as coalesce(1) would bring all the data to single executor and if the data volume is huge, this can lead to executor going OOM. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group