cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Upload of .bin file >400mb

oleh_v
New Contributor

I try to upload to local workspace folder with .bin extension.
It is required to have it locally.

I tried load from DBFS, but loading files over 265mb is not allowed with cluster. 
I tried to upload manually but failed with same error "OSError: [Errno5] Input/output error".
also i tried to compress file but since this is binary file compression didn't change significantly file size and manual upload failed as previous attempt. 

How can I upload that .bin file to local storage of my workspace?  

2 REPLIES 2

szymon_dybczak
Contributor III

Hi @oleh_v ,

You can try below approach, its supports data up to 2GB. For example using curl:

# Parameters
databricks_workspace_url="<databricks-workspace-url>"
personal_access_token="<personal-access-token>"
local_file_path="<local_file_path>"              # ex: /Users/foo/Desktop/file_to_upload.png
dbfs_file_path="<dbfs_file_path>"                # ex: /tmp/file_to_upload.png
overwrite_file="<true|false>"


curl --location --request POST https://${databricks_workspace_url}/api/2.0/dbfs/put \
     --header "Authorization: Bearer ${personal_access_token}" \
     --form contents=@${local_file_path} \
     --form path=${dbfs_file_path} \
     --form overwrite=${overwrite_file}

Upload large files using DBFS API 2.0 and PowerShell - Databricks

Kartheek_Katta
New Contributor II

Hello Slash,

Thank you for your response. I'm encountering the same issue as described. I tried running the provided code in my Databricks workspace, but I received an error. My question is how the script is expected to access local files, especially since the file Iā€™m trying to upload ("pytorch_model.bin") has both read and write permissions on my local machine.

Additionally, once I successfully upload the file to DBFS using the API, can I access and read the file directly from my code, or will I need to use a separate API to retrieve it?

Any guidance or clarification on these points would be greatly appreciated.


# Parameters
databricks_workspace_url="<url>"
personal_access_token="<token>"
local_file_path="C:\\Users\\<Username>\\Downloads\\pytorch_model.bin"
dbfs_file_path="pytorch_model.bin"
overwrite_file="true"

Kartheek_Katta_0-1725651494394.png

 



Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group