Databricks Community

data-grassroots · ‎10-01-2025

I have a couple things going on here.

First, to explain what I'm doing, I'm passing an array of objects in to a function that contain a dataframe per item. I want to write those dataframes to an excel workbook - one dataframe per worksheet. That part is working. Where I run into isues is when I attempt to append worksheets to an existing workbook (ExcelWriter mode='a' instead of 'w').

I've got a template workbook that lives out on DBFS that I copy local so I can append to it. See Limitations of Working with Files where it talks about Zip and Excel files and suggests working local and copying out to a Volume after.

I successfully copy the file locally and can copy it back out elsewhere if I comment out the ExcelWriter block.

With the ExcelWriter block in place and using the append mode I receive a FileNotFoundError telling me no such file exists. That error occurs in Pandas/io/common/get_handle() when it attempts to open the file.

Anyone know what's going on with Pandas not being able to see files on local_disk0?

FileNotFoundError: [Errno 2] No such file or directory

Louis_Frolio · ‎10-03-2025

Hey @data-grassroots , I did some digging with our internal docs and have some suggestions/tips to help you further diagnose the issue:

You're following the recommended Databricks approach for editing Excel files: copying the template to a local path (like /local_disk0), modifying it there, and copying the result back to DBFS or a Unity Catalog volume. This method is necessary because Excel files (which are essentially zip archives) can't be edited in place on the distributed object store—direct appends or random writes aren't supported, but local_disk0 allows this.

Given that: - The file is successfully copied local, - You can further copy/move it if you comment out the ExcelWriter logic, - The error only occurs when running the Pandas ExcelWriter in append mode,

Here's how to synthesize what's likely happening and offer troubleshooting steps:

Possible Causes and Troubleshooting

1. File Path Issues

Make sure you are using the exact local path (e.g., '/local_disk0/tmp/excel.xlsx') in Pandas' ExcelWriter.
Double-check for typos, or if a temporary file name is being used.
If you use %sh cp ..., remember that shell and Python might use different working directories; always specify absolute paths.

2. File Visibility and Permissions

On cluster start or between notebook cells, /local_disk0 is ephemeral but should be visible within the same session.
If running in a distributed setup (e.g., using Spark executors), ensure all code accessing /local_disk0 is running on the driver—not on a worker node that doesn't have the file.
Try running os.path.exists('/local_disk0/tmp/excel.xlsx') just before the ExcelWriter call to confirm file presence and visibility in Python.
When using pd.ExcelWriter with mode='a' , pandas expects the file to already exist at the specified path before it attempts to open it. The append mode does not create a new file if one doesn’t exist—this is different from standard Python file I/O behavior. The ExcelWriter with mode='a' uses openpyxl’s load_workbook() function under the hood, which requires an existing workbook to load.

3. ExcelWriter/Library Version Behavior

Some Pandas/XlsxWriter versions have subtle bugs or incompatibilities with 'append' mode, especially if the file was created with a different engine, or is missing internal Excel structures.
Try opening the file in Excel to confirm its usability.
Consider using the openpyxl engine, as it typically handles appends better: python with pd.ExcelWriter('/local_disk0/tmp/excel.xlsx', mode='a', engine='openpyxl') as writer: df.to_excel(writer, sheet_name='NewSheet')
If you previously created the file with an engine other than openpyxl, try rewriting it fully first.

4. Race Conditions or File Locks

If there's simultaneous access/modification, or if another process hasn't released the file, Pandas may not see it.
Ensure you're not running the copy AND Pandas open concurrently.

Key Documentation Guidance

Databricks documentation explicitly warns: "Direct-append or non-sequential (random) writes are not supported. This affects operations like writing Zip and Excel files. For these workloads: Perform the operations on a local disk first. Copy the results to the volume." You are conforming to this advice.

Hope this helps you troubleshoot your problem.

Keep us posted.

Louis.

View solution in original post

data-grassroots · ‎10-01-2025

data-grassroots · ‎10-01-2025

Here's a pretty easy way to recreate the issue - simplified to ignore the ExcelWriter part...

You can see the file is copied and shows up when listed. But can't be find from Pandas. Same behavior on local_disk0 and /tmp

Louis_Frolio · ‎10-03-2025