How to write only file on to the Blob or ADLS from Databricks?

Simha — Wed, 17 Jan 2024 12:43:42 GMT

Hi All,

I am trying to write a csv file on to the blob and ADLS from databricks notebook using pyspark and a separate folder is created with the mentioned filename and a partition is created within the folder.

I want only file to be written.

Can anyone help me to fix this issue.

Thanks in advance.

Re: How to write only file on to the Blob or ADLS from Databricks?

Lakshay — Wed, 17 Jan 2024 15:49:03 GMT

Hi @Simha , This is expected behavior. Spark always creates an output directory when writing the data and it divides the result into multiple part files. This is because multiple executors write the result into the output directory. We cannot make the spark write the file without creating the output directory.

But we can control the no. of part files that are written in output directory by using the coalesce function. To get a single file output, you can use coalesce(1) while doing the write operation. However, I would advise you to decide the coalesce partition carefully as coalesce(1) would bring all the data to single executor and if the data volume is huge, this can lead to executor going OOM.

topic Re: How to write only file on to the Blob or ADLS from Databricks? in Data Engineering

How to write only file on to the Blob or ADLS from Databricks?

Re: How to write only file on to the Blob or ADLS from Databricks?