cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Do external tables automatically receive external updates?

dvmentalmadess
Valued Contributor

Based on the instructions for creating an external table (see: https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-a-table) I had assumed that external tables were a way to add an existing object store to Unity Catalog and that once defined they would work just like managed tables. The documentation doesn't seem to specifically describe external tables have behaving differently. In but then I read these two references today:

  • โ€œOnly files in the exact directory are read; the read is not recursiveโ€
  • โ€œWhen you create a table using this method, the storage path is read only once, to prevent duplication of recordsโ€ฆโ€

see: https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-a-table-from-the...

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Mark Millerโ€‹ :

External tables in Databricks do not automatically receive external updates. When you create an external table in Databricks, you are essentially registering the metadata for an existing object store in Unity Catalog, which allows you to query the data using SQL.

When you query an external table, Databricks reads the data from the external storage location specified in the table definition. However, Databricks does not monitor the external storage location for updates or changes to the data. If you add new files to the external storage location or modify the existing files, you need to manually update the external table metadata in Unity Catalog using the

MSCK REPAIR TABLE command to add the new partitions or files.

The documentation you mentioned is correct that when you create an external table using the method described, the storage path is read only once to prevent duplication of records. This means that if you add new files to the external storage location after creating the external table, these files will not be included in the table until you update the metadata using

MSCK REPAIR TABLE.

In summary, external tables in Databricks do not automatically receive external updates. You need to manually update the metadata using the MSCK REPAIR TABLE command to add new partitions or files to the table.

View solution in original post

1 REPLY 1

Anonymous
Not applicable

@Mark Millerโ€‹ :

External tables in Databricks do not automatically receive external updates. When you create an external table in Databricks, you are essentially registering the metadata for an existing object store in Unity Catalog, which allows you to query the data using SQL.

When you query an external table, Databricks reads the data from the external storage location specified in the table definition. However, Databricks does not monitor the external storage location for updates or changes to the data. If you add new files to the external storage location or modify the existing files, you need to manually update the external table metadata in Unity Catalog using the

MSCK REPAIR TABLE command to add the new partitions or files.

The documentation you mentioned is correct that when you create an external table using the method described, the storage path is read only once to prevent duplication of records. This means that if you add new files to the external storage location after creating the external table, these files will not be included in the table until you update the metadata using

MSCK REPAIR TABLE.

In summary, external tables in Databricks do not automatically receive external updates. You need to manually update the metadata using the MSCK REPAIR TABLE command to add new partitions or files to the table.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group