cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Greater Seattle
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to update delta table in UC when underlying data in s3 changes?

HsChiang
New Contributor

I have a general question regarding how the table in Databricks changes when we change the underlying data file in S3. For example, if I convert multiple folders that are defined by dates in a single folder into parquet by using the following code: CONVERT TO DELTA parquet.`s3://example_bucket/test_1/` PARTITIONED BY (report_date DATE) and then create a table on an external location using the code: CREATE TABLE test_catalog.test_schema.test_1 USING DELTA LOCATION 's3://example_bucket/test_1/' If I then add a new folder to that same location, how to do we make the table in Databricks update as well?

1 REPLY 1

Hkesharwani
Contributor II

Hi, 
In Databricks, external tables are linked to an existing object store in Unity Catalog by registering their metadata, enabling SQL queries on the data. However, Databricks doesn't automatically track changes or updates made in the external storage location specified for these tables. Therefore, if there are any additions or modifications to the files in the external storage, it's necessary to manually refresh the external table's metadata within Unity Catalog. This is done by executing the MSCK REPAIR TABLE command, which helps in incorporating the new or updated partitions or files into the table..
Refer to the below  documentation for more information:

https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#create-a-table-from-...

Harshit Kesharwani
Data engineer at Rsystema

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now