Upload of .bin file >400mb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-05-2024 03:16 AM
I try to upload to local workspace folder with .bin extension.
It is required to have it locally.
I tried load from DBFS, but loading files over 265mb is not allowed with cluster.
I tried to upload manually but failed with same error "OSError: [Errno5] Input/output error".
also i tried to compress file but since this is binary file compression didn't change significantly file size and manual upload failed as previous attempt.
How can I upload that .bin file to local storage of my workspace?
- Labels:
-
Delta Lake
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-05-2024 03:31 AM - edited 09-05-2024 03:34 AM
Hi @oleh_v ,
You can try below approach, its supports data up to 2GB. For example using curl:
# Parameters
databricks_workspace_url="<databricks-workspace-url>"
personal_access_token="<personal-access-token>"
local_file_path="<local_file_path>" # ex: /Users/foo/Desktop/file_to_upload.png
dbfs_file_path="<dbfs_file_path>" # ex: /tmp/file_to_upload.png
overwrite_file="<true|false>"
curl --location --request POST https://${databricks_workspace_url}/api/2.0/dbfs/put \
--header "Authorization: Bearer ${personal_access_token}" \
--form contents=@${local_file_path} \
--form path=${dbfs_file_path} \
--form overwrite=${overwrite_file}
Upload large files using DBFS API 2.0 and PowerShell - Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2024 12:44 PM
Hello Slash,
Thank you for your response. I'm encountering the same issue as described. I tried running the provided code in my Databricks workspace, but I received an error. My question is how the script is expected to access local files, especially since the file I’m trying to upload ("pytorch_model.bin") has both read and write permissions on my local machine.
Additionally, once I successfully upload the file to DBFS using the API, can I access and read the file directly from my code, or will I need to use a separate API to retrieve it?
Any guidance or clarification on these points would be greatly appreciated.
# Parameters
databricks_workspace_url="<url>"
personal_access_token="<token>"
local_file_path="C:\\Users\\<Username>\\Downloads\\pytorch_model.bin"
dbfs_file_path="pytorch_model.bin"
overwrite_file="true"

