cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I'm curious if anyone has ever written a file to S3 with a custom file name?

dsugs
New Contributor II

So I've been trying to write a file to S3 bucket giving it a custom name, everything I try just ends up with the file being dumped into a folder with the specified name so the output is like ".../file_name/part-001.parquet". instead I want the file to show up as "/file_name.parquet".

1 ACCEPTED SOLUTION

Accepted Solutions

Hemant
Valued Contributor II

Hi @dsugs thanks for posting here.

You need to use repartition(1) to write the single partition file into s3, then you have to move the single file by giving your file name in the destination_path.
You can use the below snippet:

 

 

 

 

output_df.repartition(1).write.format(file_format).mode(write_mode).option("header","true").option("inferSchema", "true").save(output_path)

fname = [y.name for y in dbutils.fs.ls(output_path) if y.name.startswith("part-")]
dbutils.fs.mv(output_path + "/" + fname[0],f"{output_path}.parquet")
dbutils.fs.rm(output_path)

 

 

 

 

# This code first gets a list of all the files in the output_path directory that # start with "part-". This is because Spark writes parquet files to the output_path

# directory in partitions, and we only want to move the first partition.

# The next line moves the first partition to a new file named output_path.parquet.

# Finally, the code deletes the output_path directory.

Hemant Soni

View solution in original post

4 REPLIES 4

Tharun-Kumar
Databricks Employee
Databricks Employee

@dsugs 
This cannot be done directly. We only have access to provide the directory name. A part file is basically one among many files that are going to be under this data directory. So, if you are going to name it as file_name.parquet, then you have to name the second file as file_name2.parquet and so on. It is usually suggested not to modify the file names under the data directory. But if you still insist to do so, you can do a file level copy using dbutils.fs.cp() command and rename each file uniquely in a different location.

Hemant
Valued Contributor II

Hi @dsugs thanks for posting here.

You need to use repartition(1) to write the single partition file into s3, then you have to move the single file by giving your file name in the destination_path.
You can use the below snippet:

 

 

 

 

output_df.repartition(1).write.format(file_format).mode(write_mode).option("header","true").option("inferSchema", "true").save(output_path)

fname = [y.name for y in dbutils.fs.ls(output_path) if y.name.startswith("part-")]
dbutils.fs.mv(output_path + "/" + fname[0],f"{output_path}.parquet")
dbutils.fs.rm(output_path)

 

 

 

 

# This code first gets a list of all the files in the output_path directory that # start with "part-". This is because Spark writes parquet files to the output_path

# directory in partitions, and we only want to move the first partition.

# The next line moves the first partition to a new file named output_path.parquet.

# Finally, the code deletes the output_path directory.

Hemant Soni

Anonymous
Not applicable

Hi @dsugs 

Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.

Cheers!

rdkarthikeyan27
New Contributor II

Spark feature where to avoid network io it writes each shuffle partition as a 'part...' file on disk and each file as you said will have compression and efficient encoding by default.

So Yes it is directly related to parallel processing !!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group