we are currently looking for a way to persist and manage our unity catalog tables in an IaC manner. That is, we want to trace any changes to a table's schema and properties and ideally be able to roll back those changes seamlessly. Using the databricks_sql_table terraform resource does not seem to be the ideal solution since "changing the column definitions for a table will require dropping and re-creating the table" (https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_table). Also in this issue (https://github.com/databricks/terraform-provider-databricks/issues/1624) it is mentioned that using the API and terraform is discouraged. Is there a better alternative to managing tables? We were thinking of having notebooks with DDL scripts alongside our applications that define the tables.
Thank you for your reply! I'm not sure if audit logs help in this case. I meant tracking changes more in the sense of code versioning. I found this answer https://community.databricks.com/t5/data-engineering/how-to-deploy-azure-databricks-sql-table-views-... which looks promising but I'm curious if there are better solutions. Ideally there would be some solution similar to terraform that detects and applies changes to a table and can, unlike terraform, run "ALTER TABLE ..." statements to adapt the schema without dropping the original table.
Hey @Volker , I was looking into the same around 1 year ago. I've hit quickly the issue mentioned by you about the limitations in terraform, I had to drop this idea for delta tables. That time we went with the custom python/sql scripts with custom logic to drop/re-create external tables if the schema have changed, for delta tables we leverage schema evolution, but still update the file containing tables schema. Today I would try liquibase: https://medium.com/dbsql-sme-engineering/database-change-management-on-lakehouse-with-databricks-sql...
Change is hard. Change in software is even harder, so with any software project, engineers set up a process for change management using a series of tools such as Git, GitActions, AzureDevOps...
As you mentioned, using notebooks with Data Definition Language (DDL) scripts is a viable option. You can create notebooks that contain the table creation scripts and version control these notebooks along with your application code.
Welcome to Databricks Community: Lets learn, network and celebrate together
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.