Regarding - Managed vs External volumes and tables

APJESK
New Contributor III


From a creation perspective, the steps for managed and external volumes appear almost identical:

  1. Both require a storage credential
  2. Both require an external location
  3. Both point to customer-owned S3

So what exactly makes a volume “managed” vs “external”?

Why is it said that managed volumes are controlled by Databricks, while external volumes are not, when:

Both physically live in customer S3

Both are accessed using customer-defined IAM roles?

 

themahesh
New Contributor

Managed and external volumes may look the same because both store data in the customer’s S3 and use customer IAM roles. However, the real difference is who controls the data folder.

With a managed volume, Databricks creates the folder in S3 and controls it. If the volume is deleted, Databricks also deletes the data.

With an external volume, the folder already belongs to the customer. Databricks can read and write to it, but if the volume is deleted, the data stays in S3.

Simply: the data lives in the same place, but the ownership is different. Managed volumes are controlled by Databricks; external volumes are controlled by the customer.