cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Husky
New Contributor III

Context:
IDE: IntelliJ 2023.3.2
Library: databricks-connect 13.3
Python: 3.10

Description:
I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience.  

I download a file from the public internet and I want to store it in an external Unity Catalog Volume (hosted on S3). I would like to upload the file using a volume path and not directly uploading it to S3 via AWS Credentials.

Everything works fine using a Databricks Notebook:
E.g.:

 

dbutils.fs.cp("<local/file/path>", "/Volumes/<path>")

 

or:

 

source_file = ...
with open("/Volumes/<path>", 'wb') as destination_file:
    destination_file.write(source_file)

 

I can't figure out a way to do that in my IDE locally. 
Using dbutils:

 

dbutils.fs.cp("file:/<local/path>", "/Volumes/<path>")

 

I get the error:

 

databricks.sdk.errors.mapping.InvalidParameterValue: Path must be absolute: \Volumes\<path>

 

Using python's with statement won't work, because the Unity Catalog Volume is not mounted to my local machine.

Is there a way to upload files from the local machine or memory into Unity Catalog Volumes?

1 ACCEPTED SOLUTION

Accepted Solutions

lathaniel
New Contributor III

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.

The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):

w = WorkspaceClient()
w.files.upload('/your/volume/path/foo.txt', 'foo bar')

View solution in original post

4 REPLIES 4

Husky
New Contributor III

Thanks for your answer. But I want to upload the files/data programmatically and not manually with the Databricks UI.

lathaniel
New Contributor III

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.

The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):

w = WorkspaceClient()
w.files.upload('/your/volume/path/foo.txt', 'foo bar')

Husky
New Contributor III

Thanks, that's what I was looking for.

Even though it would be nice to not read the binary but to provide just the path to the file to upload.

dkushari
Databricks Employee
Databricks Employee

Hey Husky,

You can provide just the path to the file to upload with REST Api call. https://docs.databricks.com/api/workspace/files/upload. Its in Public Preview. Please see below.

def return_ws_url():
    workspace_url = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get("browserHostName")
    match = re.match(r'Some\((.*)\)', str(workspace_url))
    if match:
      value = match.group(1)
      return(value)
    else:
        print("No value found")

def upload_ws_file_to_volume(local_path, remote_path):
  with open(local_path, 'rb') as f:
    r = requests.put(
      'https://{databricks_instance}/api/2.0/fs/files{path}'.format(
        databricks_instance=return_ws_url(), path=remote_path),
      headers=headers,
      data=f)
    r.raise_for_status()

headers = {'Authorization' : 'Bearer {}'.format(dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get())}
print(headers)

upload_ws_file_to_volume(<<Your source file local path>>, <<UC Volume path>>)

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group