Max. file size in a managed volume

0000abcd
New Contributor II

Tried to move a ~30GB file (I know it's too large for data science) from the ephemeral storage to a managed volume, but after a while the output returned "Input/output error"; later discovered that only ~14.8GB was downloaded. Is there a maximum size of a single file in managed volumes?

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @0000abcd ,  short answer: there isn’t a Databricks-imposed single-file size cap for files in managed volumes; the practical limit is whatever the underlying cloud object storage supports. You can write very large files via Spark, the Files REST API, SDKs, or CLI. For uploads/downloads in the UI, the per-file limit is 5 GB, so use programmatic methods for larger files.

 

What’s the actual limit?

  • Volumes themselves don’t cap file size; they support files up to the maximum size supported by your cloud storage provider. Use Spark, the Databricks Files REST API, SDKs, or CLI for large files.
  • The Catalog Explorer UI upload/download workflow is limited to 5 GB per file, which is why large transfers should go through API/SDK/CLI instead.
  • Don’t confuse volumes with workspace files (the /Workspace file system), which have a 500 MB per-file limit; volumes are separate and meant for large, non-tabular assets.

Likely cause of your “Input/output error” and partial 14.8 GB copy

When copying very large files from ephemeral/driver-local storage through FUSE paths, long-running single-stream transfers can fail due to transient I/O issues or timeouts. Using the Files API/SDK/CLI avoids those UI/FUSE constraints and is the recommended path for multi-GB objects.
 

Recommended ways to move a 30 GB file into a managed volume

  • Databricks CLI (fs commands): Use CLI to put/cp the file to a volume path like /Volumes/<catalog>/<schema>/<volume>/<dir>/<file>. This uses volume-aware operations and handles large files better than UI.
  • Files REST API (PUT/GET): Example PUT to a managed volume path: bash curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/<catalog>/<schema>/<volume>/<dir>/myfile.bin?overwrite=true" \ --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \ --data-binary @./myfile.bin
  • Databricks SDKs (Python/Java/Go): Use WorkspaceClient.files to upload/download to/from /Volumes/.... This is designed for files in volumes and supports large objects programmatically.

Verification steps after upload

  • List and check the target file: dbutils.fs.ls("dbfs:/Volumes/<catalog>/<schema>/<volume>/<dir>/") and confirm the size matches the source. * Optionally compute a checksum locally and in Databricks to ensure integrity for very large transfers.
If you were using the UI or copying via a FUSE path, retry with the CLI or Files API/SDK and it should handle your 30 GB file.
 
Hope this helps, Louis.

View solution in original post

Then why did this message appear every time I try to upload a file with 6-8GB only (CLI)?
"Error: Server received a request which exceeds maximum allowed content length. RequestSize(bytes): -1, Limit(bytes): 5368709120"