cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

External Table refresh

turagittech
New Contributor III

Hi,

I have a blob storage area in Azure where json files are being created. I can create an external table on the storage blob container, but when new files are added I don't see extra rows to query the data. Is there a better approach to accessing the data? Should I load it differently?

3 REPLIES 3

ashraf1395
Honored Contributor

Hi @turagittech ,

External tables in Databricks do not automatically receive external updates. When you create an external table in Databricks, you are essentially registering the metadata for an existing object store in Unity Catalog, which allows you to query the data using SQL.

When you query an external table, Databricks reads the data from the external storage location specified in the table definition. However, Databricks does not monitor the external storage location for updates or changes to the data. If you add new files to the external storage location or modify the existing files, you need to manually update the external table metadata in Unity Catalog using the

MSCK REPAIR TABLE command to add the new partitions or files.

The documentation you mentioned is correct that when you create an external table using the method described, the storage path is read only once to prevent duplication of records. This means that if you add new files to the external storage location after creating the external table, these files will not be included in the table until you update the metadata using

MSCK REPAIR TABLE.

In summary, external tables in Databricks do not automatically receive external updates. You need to manually update the metadata using the MSCK REPAIR TABLE command to add new partitions or files to the table.

https://docs.databricks.com/aws/en/tables/external

Hi,

The MSCK REPAIR TABLE <tablename> returns this error [DELTA_OPERATION_NOT_ALLOWED_DETAIL] "Operation not allowed: `ALTER TABLE RECOVER PARTITIONS` is not supported for Delta tables:"

This is what confused me in whether it was set up in the correct matter

Nivethan_Venkat
New Contributor III

Hi @turagittech,

The above error indicates that your table seems to be in DELTA format. Please check the table creation statement, if the table format is JSON or DELTA.

PS: By default, if you are not specifying any format while creating the table on top of your JSON files in external storage, the table will be created in DELTA format.

Following link indicates the error detail you are facing while doing MSCK Repair: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-alter-table-manage-partiti...

Please check over here for suitable create table statement for your use-case: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-create-table-using#syntax

Let us know if you find this helpful, or if more clarifications needed.


Best Regards,
Nivethan V