cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Husky
New Contributor II

Context:
IDE: IntelliJ 2023.3.2
Library: databricks-connect 13.3
Python: 3.10

Description:
I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience.  

I download a file from the public internet and I want to store it in an external Unity Catalog Volume (hosted on S3). I would like to upload the file using a volume path and not directly uploading it to S3 via AWS Credentials.

Everything works fine using a Databricks Notebook:
E.g.:

 

dbutils.fs.cp("<local/file/path>", "/Volumes/<path>")

 

or:

 

source_file = ...
with open("/Volumes/<path>", 'wb') as destination_file:
    destination_file.write(source_file)

 

I can't figure out a way to do that in my IDE locally. 
Using dbutils:

 

dbutils.fs.cp("file:/<local/path>", "/Volumes/<path>")

 

I get the error:

 

databricks.sdk.errors.mapping.InvalidParameterValue: Path must be absolute: \Volumes\<path>

 

Using python's with statement won't work, because the Unity Catalog Volume is not mounted to my local machine.

Is there a way to upload files from the local machine or memory into Unity Catalog Volumes?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Husky, You can upload files from your local machine or memory into Unity Catalog Volumes in Databricks. 

 

Here are the steps to achieve this:

 

Ensure Prerequisites: Before you proceed, make sure you have the following:

 

  • A Databricks workspace with Unity Catalog enabled. If you haven’t set up Unity Catalog yet, refer to the documentation on getting started with Unity Catalog.
  • The necessary privileges:
    • WRITE VOLUME privilege on the target volume where you want to upload files.
    • USE SCHEMA privilege on the parent schema.
    • USE CATALOG privilege on the parent catalog.

Upload Files to Volume: Follow these steps to upload files to a Unity Catalog volume:

  • In your Databricks workspace, click New > Add Data.
  • Select Upload Files to Volume.
  • Choose a volume or a directory inside a volume, or paste a volume path.
  • Click the browse button or drag and drop files directly into the drop zone.

Additional Notes:

  • For semi-structured or structured files, you can use Auto Loader or COPY INTO to create tables from the uploaded files.
  • You can also run various machine learning and data science workloads on files within the volume.
  • Additionally, you can upload libraries, certificates, and other configuration files of arbitrary formats (e.g., .whl or .txt) that you want to use for configuring cluster libraries, notebook-scoped libraries, or job dependencies.

Remember that volumes are supported in Databricks Runtime 13.2 and above. If you encounter any issues, ensure you’re using a compatible runtime version. 

Husky
New Contributor II

Thanks for your answer. But I want to upload the files/data programmatically and not manually with the Databricks UI.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.