- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2024 02:16 AM
Context:
IDE: IntelliJ 2023.3.2
Library: databricks-connect 13.3
Python: 3.10
Description:
I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience.
I download a file from the public internet and I want to store it in an external Unity Catalog Volume (hosted on S3). I would like to upload the file using a volume path and not directly uploading it to S3 via AWS Credentials.
Everything works fine using a Databricks Notebook:
E.g.:
dbutils.fs.cp("<local/file/path>", "/Volumes/<path>")
or:
source_file = ...
with open("/Volumes/<path>", 'wb') as destination_file:
destination_file.write(source_file)
I can't figure out a way to do that in my IDE locally.
Using dbutils:
dbutils.fs.cp("file:/<local/path>", "/Volumes/<path>")
I get the error:
databricks.sdk.errors.mapping.InvalidParameterValue: Path must be absolute: \Volumes\<path>
Using python's with statement won't work, because the Unity Catalog Volume is not mounted to my local machine.
Is there a way to upload files from the local machine or memory into Unity Catalog Volumes?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2024 11:20 AM
Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.
The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):
w = WorkspaceClient()
w.files.upload('/your/volume/path/foo.txt', 'foo bar')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2024 01:52 AM
Thanks for your answer. But I want to upload the files/data programmatically and not manually with the Databricks UI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2024 11:20 AM
Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.
The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):
w = WorkspaceClient()
w.files.upload('/your/volume/path/foo.txt', 'foo bar')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2024 02:30 AM
Thanks, that's what I was looking for.
Even though it would be nice to not read the binary but to provide just the path to the file to upload.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-11-2024 11:59 AM
Hey Husky,
You can provide just the path to the file to upload with REST Api call. https://docs.databricks.com/api/workspace/files/upload. Its in Public Preview. Please see below.
def return_ws_url():
workspace_url = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get("browserHostName")
match = re.match(r'Some\((.*)\)', str(workspace_url))
if match:
value = match.group(1)
return(value)
else:
print("No value found")
def upload_ws_file_to_volume(local_path, remote_path):
with open(local_path, 'rb') as f:
r = requests.put(
'https://{databricks_instance}/api/2.0/fs/files{path}'.format(
databricks_instance=return_ws_url(), path=remote_path),
headers=headers,
data=f)
r.raise_for_status()
headers = {'Authorization' : 'Bearer {}'.format(dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get())}
print(headers)
upload_ws_file_to_volume(<<Your source file local path>>, <<UC Volume path>>)

