Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
we are currently looking for a way to persist and manage our unity catalog tables in an IaC manner. That is, we want to trace any changes to a table's schema and properties and ideally be able to roll back those changes seamlessly. Using the databricks_sql_table terraform resource does not seem to be the ideal solution since "changing the column definitions for a table will require dropping and re-creating the table" (https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_table). Also in this issue (https://github.com/databricks/terraform-provider-databricks/issues/1624) it is mentioned that using the API and terraform is discouraged. Is there a better alternative to managing tables? We were thinking of having notebooks with DDL scripts alongside our applications that define the tables.
Thank you for your reply! I'm not sure if audit logs help in this case. I meant tracking changes more in the sense of code versioning. I found this answer https://community.databricks.com/t5/data-engineering/how-to-deploy-azure-databricks-sql-table-views-... which looks promising but I'm curious if there are better solutions. Ideally there would be some solution similar to terraform that detects and applies changes to a table and can, unlike terraform, run "ALTER TABLE ..." statements to adapt the schema without dropping the original table.
Hey @Volker , I was looking into the same around 1 year ago. I've hit quickly the issue mentioned by you about the limitations in terraform, I had to drop this idea for delta tables. That time we went with the custom python/sql scripts with custom logic to drop/re-create external tables if the schema have changed, for delta tables we leverage schema evolution, but still update the file containing tables schema. Today I would try liquibase: https://medium.com/dbsql-sme-engineering/database-change-management-on-lakehouse-with-databricks-sql...
AUTHOR UPDATE (7.4.2024) - The liquibase connector has a new realease v1.2 that should now be the minimum required version to run in production. Change is hard. Change in software is even harder, so...
As you mentioned, using notebooks with Data Definition Language (DDL) scripts is a viable option. You can create notebooks that contain the table creation scripts and version control these notebooks along with your application code.
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.