Databricks Community

DanielW · ‎08-05-2025

I have been testing out a notebook. Is is possible to write any tabular outputs to a .csv file and if so would this be the filesystem or an S3 bucket.

I get errrors when I try either approaches with the snippet below.

#sort by match date to get the most recent match
concerts_df=concerts_df.sort_values(by='concert_date', ascending=False)

# Ensure the directory exists using Databricks utilities
output_dir = '/dbfs/tmp'
dbutils.fs.mkdirs(output_dir)
# Write the DataFrame to a CSV file
#concert_df.to_csv(f'{output_dir}/blacksabbath_concerts.csv', index=False)
# Define the S3 bucket and file path
#bucket_name = 'my-s3-bucket'
#file_path = f's3://{bucket_name}/blacksabbath_concerts.csv'

display(matches_df)

ilir_nuredini · ‎08-05-2025

Hello @DanielW

DBFS (according to the mentioned output_dir variable) is now considered a legacy approach, and you would need to use Unity Catalog Volumes for storing and accessing data files going forward and it is recommended. FYI: the dbfs is disabled in the free edition. Refer below on how you can leverage UC Volume on interacting with files using csv format as an example.

Example upload to UC Volume using python:

1. Using pandas to save as csv file with an example data:

volume_path = "/Volumes/workspace/default/temp/output.csv"
import pandas as pd

df = pd.DataFrame([
    ["Ilir", 30],
    ["Nuredini", 25]
], columns=["name", "age"])

# Save to a Unity Catalog volume path
df.to_csv(volume_path, index=False)

2. Using with open() :

import csv

volume_path = "/Volumes/workspace/default/temp/output2.csv"

rows = [["name", "age"], ["Ilir2", 30], ["Nuredini2", 25]]

# Write CSV using `with open`
with open(volume_path, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerows(rows)

Here it is an example how to read a csv file from Volume:

df = spark.read.csv("/Volumes/workspace/default/temp/output2.csv", header=True, inferSchema=True)
df.show()

Hope that helps. Let me know if you need a more specific scenario.

Best, Ilir

View solution in original post

ilir_nuredini · ‎08-05-2025

Hello @DanielW

DBFS (according to the mentioned output_dir variable) is now considered a legacy approach, and you would need to use Unity Catalog Volumes for storing and accessing data files going forward and it is recommended. FYI: the dbfs is disabled in the free edition. Refer below on how you can leverage UC Volume on interacting with files using csv format as an example.

Example upload to UC Volume using python:

1. Using pandas to save as csv file with an example data:

volume_path = "/Volumes/workspace/default/temp/output.csv"
import pandas as pd

df = pd.DataFrame([
    ["Ilir", 30],
    ["Nuredini", 25]
], columns=["name", "age"])

# Save to a Unity Catalog volume path
df.to_csv(volume_path, index=False)

2. Using with open() :

import csv

volume_path = "/Volumes/workspace/default/temp/output2.csv"

rows = [["name", "age"], ["Ilir2", 30], ["Nuredini2", 25]]

# Write CSV using `with open`
with open(volume_path, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerows(rows)

Here it is an example how to read a csv file from Volume:

df = spark.read.csv("/Volumes/workspace/default/temp/output2.csv", header=True, inferSchema=True)
df.show()

Hope that helps. Let me know if you need a more specific scenario.

Best, Ilir

pop_smoke · ‎09-06-2025

is there not pyspark syntax to write file as csv.?
earlier it was so easy to write in community edition.

DanielW · ‎08-06-2025

Hi @ilir_nuredini ,

Thanks very much for the advice as this really helped esp DFS not available on the free edtions.

I ran into an errror and I realised I needed to create a volume and schema first. So with with a little help from the AI assistant,

-- Create a catalog
CREATE CATALOG my_catalog;

-- Create a schema within the catalog
CREATE SCHEMA my_catalog.my_schema;

-- Create a volume within the schema
CREATE VOLUME my_catalog.my_schema.my_volume;

Then the necessary permissions, just for me atm.

%sql
-- Grant privileges
GRANT USE CATALOG ON CATALOG my_catalog TO `your_user_or_group`;
GRANT USE SCHEMA ON SCHEMA my_catalog.my_schema TO `your_user_or_group`;
GRANT WRITE VOLUME ON VOLUME my_catalog.my_schema.my_volume TO `your_user_or_group`;

Thereafter I can now reference the volume I need to write to

#create the volume path
filename = "output.csv"
volume_path = "/Volumes/my_catalog/my_schema/my_volume/" + filename

# then write my output
concert_output_df = pd.DataFrame(concerts_df[['concert_id',bandname,......

I am not sure of the naming conventions I should use for creating volumes and schemas so would welcome any advice. I will try out the other snippets you suggested as I am keen to create a few tables with these output files.Thanks abgain.

ilir_nuredini · ‎08-06-2025

Hello @DanielW ,

Glad it helped. While there is no single way to how UC should be organized that would it every organization,
I would highly recommend this article, which is super helpful on deciding what naming convention fits
your scenario the best:

https://bpcs.com/blog/unity-catalog-an-easy-guide-to-naming-conventions

If the reply was helpful, it would be great if you can "accept it" as a solution so fellow colleagues would benefit from this too. Thank you!

Best, Ilir

Databricks Community

Writing to data to a .csv file ( in the Databricks free edition)

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐