cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Copy files from /tmp to abfss location

deepu
New Contributor II

I have a notebook which generates a bunch of excel and pdf reports. These reports needs to be sent out through email and also needs to be archived in external location. 

i am able to generate these reports in the /tmp file and then send them as attachments. But to archive it , when i try to copy it to the external location it is throwing the error: insufficient access.

its a unity catalog enabled Databrics instance.

 I have tried shutils and dbutils but shutils donot recognize the abfss path whereas dbutils donot recognize the /tmp path.

 

any suggestions would be very helpful

thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @deepu,

The reason it isn't working is that Python’s shutil only understands local/POSIX-style paths, not abfss:// URIs, and dbutils.fs expects Databricks-style paths (e.g., file:/..., /Volumes/...). 

The recommended pattern for this is..

  1. Write reports to local disk (what you already do, /tmp).
  2. Copy from local disk --> a Unity Catalog volume (backed by your external location).
  3. Always reference the volume via /Volumes/... paths inside Databricks, not raw abfss:// from Python stdlib.

On Unity Catalog enabled clusters, you’re dealing with two different file systems:

  • /tmp is ephemeral local storage on the driver.
  • abfss://... is remote cloud storage.

Firstly, back your external location with a UC volume if you haven't done that already. For example, 

CREATE EXTERNAL VOLUME main.reporting.reports_archive
  LOCATION 'abfss://<container>@<account>.dfs.core.windows.net/<path>';

Make sure the user/service has USE CATALOG, USE SCHEMA and WRITE VOLUME on this volume.

 
You can then copy from /tmp to the volume you created (as above) 
dbutils.fs.cp(
    "file:/tmp/my_report.xlsx",
    "/Volumes/main/reporting/reports_archive/my_report.xlsx",
    True  # overwrite
)
The file:/ prefix is required for dbutils.fs to see local/ephemeral storage, and /Volumes/... is the POSIX-style path for your UC volume.
 
Alternatively, you can use shutil against the volume path (not abfss):
 
from shutil import copyfile

copyfile(
    "/tmp/my_report.xlsx",
    "/Volumes/main/reporting/reports_archive/my_report.xlsx"
)
Once the file is in the volume, it’s stored in your external location and governed by Unity Catalog, which should cover your archive requirement.
 
Some docs for reference:
 
Try this and let me know if you have further questions.. 

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @deepu,

The reason it isn't working is that Python’s shutil only understands local/POSIX-style paths, not abfss:// URIs, and dbutils.fs expects Databricks-style paths (e.g., file:/..., /Volumes/...). 

The recommended pattern for this is..

  1. Write reports to local disk (what you already do, /tmp).
  2. Copy from local disk --> a Unity Catalog volume (backed by your external location).
  3. Always reference the volume via /Volumes/... paths inside Databricks, not raw abfss:// from Python stdlib.

On Unity Catalog enabled clusters, you’re dealing with two different file systems:

  • /tmp is ephemeral local storage on the driver.
  • abfss://... is remote cloud storage.

Firstly, back your external location with a UC volume if you haven't done that already. For example, 

CREATE EXTERNAL VOLUME main.reporting.reports_archive
  LOCATION 'abfss://<container>@<account>.dfs.core.windows.net/<path>';

Make sure the user/service has USE CATALOG, USE SCHEMA and WRITE VOLUME on this volume.

 
You can then copy from /tmp to the volume you created (as above) 
dbutils.fs.cp(
    "file:/tmp/my_report.xlsx",
    "/Volumes/main/reporting/reports_archive/my_report.xlsx",
    True  # overwrite
)
The file:/ prefix is required for dbutils.fs to see local/ephemeral storage, and /Volumes/... is the POSIX-style path for your UC volume.
 
Alternatively, you can use shutil against the volume path (not abfss):
 
from shutil import copyfile

copyfile(
    "/tmp/my_report.xlsx",
    "/Volumes/main/reporting/reports_archive/my_report.xlsx"
)
Once the file is in the volume, it’s stored in your external location and governed by Unity Catalog, which should cover your archive requirement.
 
Some docs for reference:
 
Try this and let me know if you have further questions.. 

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***