โ11-27-2023 05:48 AM
We are currently upgrading our Lakehouse to use the Unity Catalog benefits. We will mostly use external tables because alle our DETLA tables are already stored in Azure Storage. I try to figure out how to update the table property "delta.lastUpdateverion". Since table schema's can change over time you want the external tables have the last schema version. Looking at the documentation, nothing is mentioned about this table property as far as I have seen. The data in the table did change when we updated the delta table, but somehow you fully need to recreate the table if the schema has changed.
What is a best practice is this case? Recreate the external table? Or did I overlooked something?
โ11-28-2023 01:54 AM
Hi @rudyevers, I appreciate your understanding of the intricacies of working with external tables and Delta Lake. Indeed, when you create an external table that references data in an external location, it captures the version of the data at that specific moment. Subsequent schema changes to the underlying data do not automatically update the external table, as it continues to reference the previous version. The delta.lastUpdateversion
property reflects this behaviour.
To ensure that the external table reflects the most up-to-date schema, youโre correct that dropping and recreating the external table is necessary. By doing so, youโll align the delta.lastUpdateversion
with the current state of the data.
If you have any further questions or need additional assistance, feel free to ask! ๐
โ11-27-2023 07:53 AM
Hi @rudyevers, When dealing with Delta Lake tables and schema updates, there are a few best practices to consider:
Schema Updates:
Stream Termination:
Replacing the Whole Table:
Column Mapping for Renaming:
In summary, consider your specific use case:
โ11-28-2023 01:46 AM
Hi @Kaniz_Fatma
Thank you for your reponse. I am aware that you can change your DELTA with de DDL statements, but in our case we write directly to storage and not to the unity catalog. So when a external table is created referring to a external location it takes the version at that specific moment. When the schema is changed afterwards the external table is not update because it is still referring to a previous version. That's also what the tabel property delta.lastUpdateversion is saying. So in this case it looks like you have to drop and create the external table so the delta.lastUpdateversion is the correct one.
โ11-28-2023 01:54 AM
Hi @rudyevers, I appreciate your understanding of the intricacies of working with external tables and Delta Lake. Indeed, when you create an external table that references data in an external location, it captures the version of the data at that specific moment. Subsequent schema changes to the underlying data do not automatically update the external table, as it continues to reference the previous version. The delta.lastUpdateversion
property reflects this behaviour.
To ensure that the external table reflects the most up-to-date schema, youโre correct that dropping and recreating the external table is necessary. By doing so, youโll align the delta.lastUpdateversion
with the current state of the data.
If you have any further questions or need additional assistance, feel free to ask! ๐
โ11-28-2023 03:09 AM - edited โ11-28-2023 03:13 AM
Hi @Kaniz_Fatma,
OK! My assumption was right. So that is what we have to live with for now. But a sort of refresh table function would be nice for external DELTA tables ๐
The project team was quite early with adopting DELTA as storage format but is willing to adopt the Databricks capabilities more and more. But as early bird you sometimes suffer from the choices that are made in the past. Our whole data logistic process works in away that we are able to change it easly over night (manner of speaking). But we will get there over time.
Thanks
โ11-29-2023 12:00 AM
I am in the same boat.
That is the reason I opted to use managed tables instead. OK; it means migrating tables and changing notebooks but besides not having to struggle with external tables, you also get something in return (liquid clustering f.e.).
โ11-29-2023 12:10 AM
Liquid clustering ofc also exist for external tables, what I meant is all the upcoming AI-features, of which I doubt will be available for external tables.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group