Databricks Community

0000abcd · Tuesday

Tried to move a ~30GB file (I know it's too large for data science) from the ephemeral storage to a managed volume, but after a while the output returned "Input/output error"; later discovered that only ~14.8GB was downloaded. Is there a maximum size of a single file in managed volumes?

Louis_Frolio · Tuesday

Hey @0000abcd , short answer: there isn’t a Databricks-imposed single-file size cap for files in managed volumes; the practical limit is whatever the underlying cloud object storage supports. You can write very large files via Spark, the Files REST API, SDKs, or CLI. For uploads/downloads in the UI, the per-file limit is 5 GB, so use programmatic methods for larger files.

What’s the actual limit?

Volumes themselves don’t cap file size; they support files up to the maximum size supported by your cloud storage provider. Use Spark, the Databricks Files REST API, SDKs, or CLI for large files.
The Catalog Explorer UI upload/download workflow is limited to 5 GB per file, which is why large transfers should go through API/SDK/CLI instead.
Don’t confuse volumes with workspace files (the /Workspace file system), which have a 500 MB per-file limit; volumes are separate and meant for large, non-tabular assets.

Likely cause of your “Input/output error” and partial 14.8 GB copy

When copying very large files from ephemeral/driver-local storage through FUSE paths, long-running single-stream transfers can fail due to transient I/O issues or timeouts. Using the Files API/SDK/CLI avoids those UI/FUSE constraints and is the recommended path for multi-GB objects.

Recommended ways to move a 30 GB file into a managed volume

Databricks CLI (fs commands): Use CLI to put/cp the file to a volume path like /Volumes/<catalog>/<schema>/<volume>/<dir>/<file>. This uses volume-aware operations and handles large files better than UI.
Files REST API (PUT/GET): Example PUT to a managed volume path: bash curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/<catalog>/<schema>/<volume>/<dir>/myfile.bin?overwrite=true" \ --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \ --data-binary @./myfile.bin
Databricks SDKs (Python/Java/Go): Use WorkspaceClient.files to upload/download to/from /Volumes/.... This is designed for files in volumes and supports large objects programmatically.

Verification steps after upload

List and check the target file: dbutils.fs.ls("dbfs:/Volumes/<catalog>/<schema>/<volume>/<dir>/") and confirm the size matches the source. * Optionally compute a checksum locally and in Databricks to ensure integrity for very large transfers.

If you were using the UI or copying via a FUSE path, retry with the CLI or Files API/SDK and it should handle your 30 GB file.

Hope this helps, Louis.