cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to manipulate files in an external location?

Tjomme
New Contributor III

According to the documentation, the usage of external locations is preferred over the use of mount points.

Unfortunately the basic funtionality to manipulate files seems to be missing.

This is my scenario:

  • create a download folder in an external location if it does not exist:
dbutils.fs.mkdirs(NewPath) does not work --> Operation failed: "This request is not authorized to perform this operation."
  • use API to download zip files from a source and write it to a mounted location using: 
f = open(fullFileName, 'w+b') --> FileNotFoundError: [Errno 2] No such file or directory
f.write(ZipBinaryData)
f.close()
  • loop all zip files to: dbutils.fs.ls does not work: needs to be replaced with LIST
    • unzip them into an extract folder containing JSON files (not tested yet, but using zipfile.ZipFile(fullZipFileName) )
    • load the JSON files into a (raw) managed table (should not be an issue)
    • further process the managed table (should not be an issue)
    • empty extract folder using
dbutils.fs.rm(NewPath,True) --> Operation failed: "This request is not authorized to perform this operation."
  • move zip file to archive folder using
dbutils.fs.mv(NewPath,ArchivePathTrue) --> Operation failed: "This request is not authorized to perform this operation."

Any help or insights on how to get this working with external locations is greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

Tjomme
New Contributor III

The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...

Currently dbutils.fs functionality is working.

For the zipfile manipulation: that only works with local (or mounted) locations.

Workaround: copy to/from local storage to abfss when required

View solution in original post

7 REPLIES 7

etsyal1e2r3
Honored Contributor

Sounds like a cloud provider permission issue. Which one are you using? Aws or Azure? How are you connecting to blob? Via external location with managed identity or sas token? The easiest method to test connectivity is to click test connection within the external location tab within "data" (bottom left). If that is successful you should test a simple read of the file directory...

dbutils.fs.ls("<blob url>")

Anonymous
Not applicable

Hi @Tjomme Vergauwen​ 

We haven't heard from you since the last response from @Tyler Retzlaff​ ​, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Tjomme
New Contributor III

Hi,

We're using Azure.

External locations are created using a managed identity.

It's not a security issue as demonstrated below:

imageSame folder, different syntax to get the list of files. The first one works, the second one throws an error.

LIST 'abfss://landingzone@***.dfs.core.windows.net/DEV' --> works
 
%py
dbutils.fs.ls('abfss://landingzone@***.dfs.core.windows.net/DEV') --> throws error

etsyal1e2r3
Honored Contributor

Thats really weird... can you go into the external location in databricks' data tab and make sure your user has the right permissions?

Tjomme
New Contributor III

it seems my access rights on the storage account are in order, but the ones on the container are missing. Reference: DataBricks UnityCatalog create table fails with "Failed to acquire a SAS token UnauthorizedAccessExc...

I'll have this changed and retry

Cool, let me know how it goes

Tjomme
New Contributor III

The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...

Currently dbutils.fs functionality is working.

For the zipfile manipulation: that only works with local (or mounted) locations.

Workaround: copy to/from local storage to abfss when required

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group