Databricks Community

BillMarshall · ‎08-29-2024

I have a notebook that outputs an Excel file. Through trial and error, and after consulting with various forums I discovered the .xlsx file needed to be written to a temp file and then copied to the volume in Unity Catalog.

When I run the notebook by itself it works perfectly.

BUT, when I schedule the job in a workflow it fails with the following error:

PermissionError: [Errno 13] Permission denied: '/local_disk0/tmp/202407-arf743-Section1.xlsx'

The workflow is using the same cluster as when I run the notebook alone.

The workflow has the "Run as" value set to my user ID.

Any help would be appreciated

florence023 · ‎09-03-2024

@BillMarshall wrote:
I have a notebook that outputs an Excel file. Through trial and error, and after consulting with various forums I discovered the .xlsx file needed to be written to a temp file and then copied to the volume in Unity Catalog.
When I run the notebook by itself it works perfectly.
BUT, when I schedule the job in a workflow it fails with the following error:
PermissionError: [Errno 13] Permission denied: '/local_disk0/tmp/202407-arf743-Section1.xlsx'
The workflow is using the same cluster as when I run the notebook alone.
The workflow has the "Run as" value set to my user ID.
Any help would be appreciated

Hello,

It sounds like you’re encountering a permissions issue when your workflow tries to write to the temporary directory. Here are a few steps you can take to troubleshoot and resolve this issue:

Check File Permissions: Ensure that the directory /local_disk0/tmp/ has the correct permissions set for your user. You can use the os.chmod() function in Python to change the file permissions if necessary1.
Verify User Permissions: Double-check that the user ID under which the workflow is running has the necessary permissions to write to the specified directory. Sometimes, even if the “Run as” value is set to your user ID, there might be additional restrictions in place.
Temporary Directory Path: Ensure that the path to the temporary directory is correctly specified and accessible. Sometimes, paths might differ slightly when running in different environments.
Cluster Configuration: Verify that the cluster configuration is consistent between running the notebook manually and running it as part of the workflow. There might be differences in environment variables or permissions.
Use a Different Temporary Directory: If possible, try using a different temporary directory that you know has the correct permissions. This can help isolate whether the issue is specific to the /local_disk0/tmp/ directory.
Check for Open Files: Ensure that the file isn’t being accessed by another process or application, which might be causing the permission error.

Hope this will help you.
Best regards,
florence023

emora · ‎09-03-2024

Hello, yes of course you need to write the excel file in the tmp folder, but then you can move it to whatever you want without problem. In my current project we implemented this method to create the file in the tmp folder, and then move it to one specific folder in an Azure Blob Container.

def export_to_excel(dbutils, df, file_name, result_path):

from pyspark.sql.functions import row_number, monotonically_increasing_id
from pyspark.sql import Window

# Const that means the excel limit rows
excel_limit = 1048576 - 1 # for header

num_rows = df.count()
num_files = num_rows / excel_limit

if num_files < 1:

with pd.ExcelWriter(f"/databricks/driver/{file_name}.xlsx", engine='xlsxwriter') as writer:

df_export = df.toPandas()
df_export.to_excel(writer, sheet_name=tab_names[0], index=False)
secondary_df.to_excel(writer, sheet_name=tab_names[1], index=False)
writer.save()

print("moving fileName: {0} from src directory: /databricks/driver/ to dst directory: {1} ".format(file_name, result_path))
dbutils.fs.mv(f"file:/databricks/driver/{file_name}.xlsx", f"{result_path}{file_name}.xlsx")

else:

df = df.withColumn("row_idx", row_number().over(Window.orderBy(monotonically_increasing_id())))

i = 0
while i < num_files:

df_export = df.filter((df.row_idx >= excel_limit*i) & (df.row_idx < excel_limit*(i+1)))

i = i+1
print('saving ' + str(df_export.count()) + ' rows in ' + file_name + '_part' + str(i) + '.xlsx')
df_export = df_export.drop(df_export.row_idx).toPandas()
# Save the file in the cluster path for no exceptions
df_export.to_excel("{0}_part{1}.xlsx".format(file_name, i), engine='xlsxwriter', index=False, header=True)
# move the file from cluster disk to dbfs
print("moving fileName: {0}_part{1} in src directory: /databricks/driver/ to dst directory: {2} ".format(file_name, i, result_path))
dbutils.fs.mv("file:/databricks/driver/{0}_part{1}.xlsx".format(file_name, i), "{2}{0}_part{1}.xlsx".format(file_name, i, result_path))

PS: You need to install the xlsxwritter python library, in order to be able to execute this method and generate the excel file.

Databricks Community

workflow permissions errors

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐