cancel
Showing results for 
Search instead for 
Did you mean: 
DatabricksTV
Community-produced videos to help you leverage Databricks in your Data & AI journey. Tune in to explore industry trends and real-world use cases from leading data practitioners.
cancel
Showing results for 
Search instead for 
Did you mean: 
Adyasha
Databricks Employee
Databricks Employee

Have you ever wondered what the difference between managed and external tables is? Olivia Zhang, a Solutions Architect at Databricks, goes over the differences and explains when to use what using an example in this quick video! Target Audience - Data Engineers, Data Analysts, Data Scientists, BI Engineers

1 Comment
Shreyash_Gupta
New Contributor III

 

Here is a basic difference between managed table and external table:

Managed Tables 

  • Managed tables are fully controlled by Databricks, including both the data and metadata lifecycle.
  • Data is stored in a Databricks-managed storage location configured by Unity Catalog.
  • These tables always use delta table.
  • Dropping a managed table also deletes its underlying data files.
  • Ideal for Databricks-native workflows where Databricks handles the organization and lifecycle of data.
  • Metadata is maintained in Unity Catalog and tightly integrated with Databricks-managed storage.
  • Managed tables benefit from Databricks' built-in storage optimizations for better performance.
  • Fully integrated with Unity Catalog’s data governance features, including access controls for both metadata and data.
  • Migration involves additional effort to extract and relocate data since it is stored within Databricks' infrastructure.
  • Best suited for new projects built entirely within Databricks.

External Tables 

  • External tables are partially controlled by Databricks, where only the metadata is managed while the data resides in external storage.
  • Data is stored in external locations such as AWS S3, Azure Blob Storage, or ADLS.
  • These tables can use delta, parquet, csv, etc.
  • Dropping an external table does not delete the underlying data files only the meta-data is deleted.
  • Best suited for referencing existing datasets or integrating data stored in external systems without moving it into Databricks.
  • Metadata is maintained in Unity Catalog, pointing to the external data location.
  • Performance depends on the configuration and capabilities of the external storage system.
  • Unity Catalog governs metadata, but the governance of the actual data relies on external storage’s access controls.
  • Migration is straightforward as the data is already stored externally.
  • Best suited for integrating external data lakes or datasets shared across multiple systems.