Hey @0000abcd , short answer: there isn’t a Databricks-imposed single-file size cap for files in managed volumes; the practical limit is whatever the underlying cloud object storage supports. You can write very large files via Spark, the Files REST API, SDKs, or CLI. For uploads/downloads in the UI, the per-file limit is 5 GB, so use programmatic methods for larger files.
What’s the actual limit?
- Volumes themselves don’t cap file size; they support files up to the maximum size supported by your cloud storage provider. Use Spark, the Databricks Files REST API, SDKs, or CLI for large files.
-
The Catalog Explorer UI upload/download workflow is limited to 5 GB per file, which is why large transfers should go through API/SDK/CLI instead.
-
Don’t confuse volumes with workspace files (the /Workspace file system), which have a 500 MB per-file limit; volumes are separate and meant for large, non-tabular assets.
Likely cause of your “Input/output error” and partial 14.8 GB copy
When copying very large files from ephemeral/driver-local storage through FUSE paths, long-running single-stream transfers can fail due to transient I/O issues or timeouts. Using the Files API/SDK/CLI avoids those UI/FUSE constraints and is the recommended path for multi-GB objects.
Recommended ways to move a 30 GB file into a managed volume
- Databricks CLI (fs commands): Use CLI to put/cp the file to a volume path like
/Volumes/<catalog>/<schema>/<volume>/<dir>/<file>. This uses volume-aware operations and handles large files better than UI.
-
Files REST API (PUT/GET): Example PUT to a managed volume path: bash
curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/<catalog>/<schema>/<volume>/<dir>/myfile.bin?overwrite=true" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data-binary @./myfile.bin
-
Databricks SDKs (Python/Java/Go): Use WorkspaceClient.files to upload/download to/from /Volumes/.... This is designed for files in volumes and supports large objects programmatically.
Verification steps after upload
- List and check the target file:
dbutils.fs.ls("dbfs:/Volumes/<catalog>/<schema>/<volume>/<dir>/") and confirm the size matches the source. * Optionally compute a checksum locally and in Databricks to ensure integrity for very large transfers.
If you were using the UI or copying via a FUSE path, retry with the CLI or Files API/SDK and it should handle your 30 GB file.
Hope this helps, Louis.