cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Free Edition Help
Engage in discussions about the Databricks Free Edition within the Databricks Community. Share insights, tips, and best practices for getting started, troubleshooting issues, and maximizing the value of your trial experience to explore Databricks' capabilities effectively.
cancel
Showing results for 
Search instead for 
Did you mean: 

Max. file size in a managed volume

0000abcd
New Contributor

Tried to move a ~30GB file (I know it's too large for data science) from the ephemeral storage to a managed volume, but after a while the output returned "Input/output error"; later discovered that only ~14.8GB was downloaded. Is there a maximum size of a single file in managed volumes?

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @0000abcd ,  short answer: there isn’t a Databricks-imposed single-file size cap for files in managed volumes; the practical limit is whatever the underlying cloud object storage supports. You can write very large files via Spark, the Files REST API, SDKs, or CLI. For uploads/downloads in the UI, the per-file limit is 5 GB, so use programmatic methods for larger files.

 

What’s the actual limit?

  • Volumes themselves don’t cap file size; they support files up to the maximum size supported by your cloud storage provider. Use Spark, the Databricks Files REST API, SDKs, or CLI for large files.
  • The Catalog Explorer UI upload/download workflow is limited to 5 GB per file, which is why large transfers should go through API/SDK/CLI instead.
  • Don’t confuse volumes with workspace files (the /Workspace file system), which have a 500 MB per-file limit; volumes are separate and meant for large, non-tabular assets.

Likely cause of your “Input/output error” and partial 14.8 GB copy

When copying very large files from ephemeral/driver-local storage through FUSE paths, long-running single-stream transfers can fail due to transient I/O issues or timeouts. Using the Files API/SDK/CLI avoids those UI/FUSE constraints and is the recommended path for multi-GB objects.
 

Recommended ways to move a 30 GB file into a managed volume

  • Databricks CLI (fs commands): Use CLI to put/cp the file to a volume path like /Volumes/<catalog>/<schema>/<volume>/<dir>/<file>. This uses volume-aware operations and handles large files better than UI.
  • Files REST API (PUT/GET): Example PUT to a managed volume path: bash curl --request PUT "https://${DATABRICKS_HOST}/api/2.0/fs/files/Volumes/<catalog>/<schema>/<volume>/<dir>/myfile.bin?overwrite=true" \ --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \ --data-binary @./myfile.bin
  • Databricks SDKs (Python/Java/Go): Use WorkspaceClient.files to upload/download to/from /Volumes/.... This is designed for files in volumes and supports large objects programmatically.

Verification steps after upload

  • List and check the target file: dbutils.fs.ls("dbfs:/Volumes/<catalog>/<schema>/<volume>/<dir>/") and confirm the size matches the source. * Optionally compute a checksum locally and in Databricks to ensure integrity for very large transfers.
If you were using the UI or copying via a FUSE path, retry with the CLI or Files API/SDK and it should handle your 30 GB file.
 
Hope this helps, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now