Hey @data-grassroots , I did some digging with our internal docs and have some suggestions/tips to help you further diagnose the issue:
You're following the recommended Databricks approach for editing Excel files: copying the template to a local path (like /local_disk0), modifying it there, and copying the result back to DBFS or a Unity Catalog volume. This method is necessary because Excel files (which are essentially zip archives) can't be edited in place on the distributed object store—direct appends or random writes aren't supported, but local_disk0 allows this.
Given that: - The file is successfully copied local, - You can further copy/move it if you comment out the ExcelWriter logic, - The error only occurs when running the Pandas ExcelWriter in append mode,
Here's how to synthesize what's likely happening and offer troubleshooting steps:
Possible Causes and Troubleshooting
1. File Path Issues
- Make sure you are using the exact local path (e.g., '/local_disk0/tmp/excel.xlsx') in Pandas' ExcelWriter.
- Double-check for typos, or if a temporary file name is being used.
- If you use %sh cp ..., remember that shell and Python might use different working directories; always specify absolute paths.
2. File Visibility and Permissions
- On cluster start or between notebook cells, /local_disk0 is ephemeral but should be visible within the same session.
- If running in a distributed setup (e.g., using Spark executors), ensure all code accessing /local_disk0 is running on the driver—not on a worker node that doesn't have the file.
- Try running
os.path.exists('/local_disk0/tmp/excel.xlsx')
just before the ExcelWriter call to confirm file presence and visibility in Python.
- When using pd.ExcelWriter with mode='a' , pandas expects the file to already exist at the specified path before it attempts to open it. The append mode does not create a new file if one doesn’t exist—this is different from standard Python file I/O behavior. The ExcelWriter with mode='a' uses openpyxl’s load_workbook() function under the hood, which requires an existing workbook to load.
3. ExcelWriter/Library Version Behavior
- Some Pandas/XlsxWriter versions have subtle bugs or incompatibilities with 'append' mode, especially if the file was created with a different engine, or is missing internal Excel structures.
- Try opening the file in Excel to confirm its usability.
- Consider using the openpyxl engine, as it typically handles appends better:
python
with pd.ExcelWriter('/local_disk0/tmp/excel.xlsx', mode='a', engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='NewSheet')
- If you previously created the file with an engine other than openpyxl, try rewriting it fully first.
4. Race Conditions or File Locks
- If there's simultaneous access/modification, or if another process hasn't released the file, Pandas may not see it.
- Ensure you're not running the copy AND Pandas open concurrently.
Key Documentation Guidance
Databricks documentation explicitly warns: "Direct-append or non-sequential (random) writes are not supported. This affects operations like writing Zip and Excel files. For these workloads: Perform the operations on a local disk first. Copy the results to the volume." You are conforming to this advice.
Hope this helps you troubleshoot your problem.
Keep us posted.
Louis.