<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Schema migration of production delta tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/81907#M36445</link>
    <description>&lt;P&gt;hey everyone!&lt;BR /&gt;coming back with an update: we ended up migrating all our previous SQL scripts to a DBT project (&lt;A href="https://www.getdbt.com/" target="_blank"&gt;https://www.getdbt.com/&lt;/A&gt;) - the reason for this was explicitly to make it easy to handle schema migrations in a hive metastore (we can't/won't migrate to unity catalog on a short notice).&lt;/P&gt;&lt;P&gt;What got me sold on DTB (the solution was proposed by someone smarter than me :))&amp;nbsp; was the fact that you can basically instruct it to handle migrations itself in a one-liner:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;+on_schema_change&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"sync_all_columns"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Hope this helps &amp;amp; good luck!&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Mon, 05 Aug 2024 14:37:05 GMT</pubDate>
    <dc:creator>zerobugs</dc:creator>
    <dc:date>2024-08-05T14:37:05Z</dc:date>
    <item>
      <title>Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55438#M30330</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Goal&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;We would like to be in control of schema migrations of delta tables in all dev and production environments, and it must be automatically deployed.&lt;/P&gt;&lt;P&gt;I anticipated this to be a common problem with a well-known standard solution. But unfortunately, I haven't come across a good solution yet.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Current Solution&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In our project we have created a custom tool/concept where SQL scripts are named in a way that ensures that they are executed in the correct order. In each environment, we store a file keeping track of which of the migrations (each script) has been executed. During deployment in CD, each migration that hasn't previously been executed is executed.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The Problem&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1. The first problem is that we'd like a standard solution.&lt;/P&gt;&lt;P&gt;2. The second problem is that the metadata is lost when the Databricks (for any reason) is recreated. This is a problem until we eventually switch from hive metastore to Unity Catalogue. In this situation a migration renaming a column using the column mapping feature will be registered as being executed, but the column name change is lost.&lt;/P&gt;&lt;P&gt;What are best practices or recommendations for these problems?&lt;/P&gt;</description>
      <pubDate>Mon, 18 Dec 2023 14:26:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55438#M30330</guid>
      <dc:creator>BjarkeM</dc:creator>
      <dc:date>2023-12-18T14:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55439#M30331</link>
      <description>&lt;P&gt;Have you explored the use of Terraform to help manage deployments? I believe that maybe what you're looking for. We use environment variables in project folders that drive what parameters are used, including the pipelines/dlt schemas.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Dec 2023 14:33:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55439#M30331</guid>
      <dc:creator>RyanD-AgCountry</dc:creator>
      <dc:date>2023-12-18T14:33:43Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55440#M30332</link>
      <description>&lt;P&gt;Thx&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22506"&gt;@RyanD-AgCountry&lt;/a&gt;&amp;nbsp;We use terraform. Can you elaborate on your solution?&lt;/P&gt;</description>
      <pubDate>Mon, 18 Dec 2023 14:43:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55440#M30332</guid>
      <dc:creator>BjarkeM</dc:creator>
      <dc:date>2023-12-18T14:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55442#M30334</link>
      <description>&lt;P&gt;Sure, I'll do my best! While I'm unable to share exact internal code, I can reference boilerplate info.&lt;/P&gt;&lt;P&gt;The Terraform provider for &lt;A href="https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/pipeline" target="_self"&gt;databricks_pipeline&lt;/A&gt; has a &lt;EM&gt;target&amp;nbsp;&lt;/EM&gt;parameter, where you can point the pipeline to the intended catalog location. Our &lt;EM&gt;.tf&lt;/EM&gt; file for pipelines uses something along the lines of this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;resource "databricks_pipeline" "pipeline" {
    provider = databricks.workspace
    for_each = var.pipelines
    name = each.key

    development = var.dlt_development
    continuous = false
    channel = "PREVIEW"
    photon = true
    edition = "ADVANCED"
    catalog = "${var.env}_${each.value["catalog"]}"
    target = each.value["target"]

    configuration = {
        "pipelines.clusterShutdown.delay" = "10m"
    }
    cluster {
        label = "default"
        autoscale {
            min_workers = each.value["minWorkers"]
            max_workers = each.value["maxWorkers"]
            mode = "ENHANCED"
        }
    }
    library {
        notebook {
            path = each.value["notebookpath"]
        }
    }
}&lt;/LI-CODE&gt;&lt;P&gt;Then, when our devops CICD yml script runs, we call the resulting &lt;EM&gt;.tfvars&lt;/EM&gt; files to pull in variables and parameters. In the case of pipelines, we have standard json notation similar to this:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;pipelines = {
    "pipeline1" = {
        catalog = "bronze"
        target = "database1"
        notebookpath = "/Repos/Engineering/project/notebook..."
    }
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Dec 2023 14:51:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55442#M30334</guid>
      <dc:creator>RyanD-AgCountry</dc:creator>
      <dc:date>2023-12-18T14:51:51Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55514#M30356</link>
      <description>&lt;P&gt;Thx again&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22506"&gt;@RyanD-AgCountry&lt;/a&gt;&amp;nbsp;I, however, don't understand how it is a solution to our problem. I'm talking about when - say - a delta table schema changes and is being migrated in the CD together with the deployment of the updated code working on the table with the new schema.&lt;/P&gt;&lt;P&gt;And as we not yet use UC the delta table meta data - like column name mappings executed as part of the migrations - are lost when/if the Databricks workspace is re-established.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Dec 2023 17:26:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55514#M30356</guid>
      <dc:creator>BjarkeM</dc:creator>
      <dc:date>2023-12-19T17:26:15Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55523#M30360</link>
      <description>&lt;P&gt;I apologize, I misunderstood the problem. I don't believe there is any legacy hive_metastore support for migrations of schemas between environments at the workspace level.&lt;/P&gt;&lt;P&gt;I do believe that UC and a combination use of live tables, schema evolution, and terraform may help once you get there, or it may help rebase the approach you are taking.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Dec 2023 19:37:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/55523#M30360</guid>
      <dc:creator>RyanD-AgCountry</dc:creator>
      <dc:date>2023-12-19T19:37:43Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/63811#M32362</link>
      <description>&lt;P&gt;Hello, so does this mean that it's necessary to migrate away from hive_metastore to unity_catalog in order to be able to use schema migrations?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Mar 2024 10:35:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/63811#M32362</guid>
      <dc:creator>zerobugs</dc:creator>
      <dc:date>2024-03-15T10:35:14Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/80971#M36180</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Same question here, and I couldn't find much content about how to handle schema changes deployment.&lt;/P&gt;&lt;P&gt;I'm relatively new to databricks so there may be some obvious functionalities that I'm missing but let's say I need to deploy some schema changes like adding columns to a table. I can add the code in dev but this will only need to run once for schema changes and not everytime the job needs to run, and I don't want to handle that by either running the command manually in prod or managing errors or if statements in the job.&amp;nbsp;&lt;/P&gt;&lt;P&gt;What are the best practices for that ? Having a specific, or a specific job in a repo that is not scheduled but runs once ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jul 2024 12:59:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/80971#M36180</guid>
      <dc:creator>MikaelB</dc:creator>
      <dc:date>2024-07-29T12:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/81907#M36445</link>
      <description>&lt;P&gt;hey everyone!&lt;BR /&gt;coming back with an update: we ended up migrating all our previous SQL scripts to a DBT project (&lt;A href="https://www.getdbt.com/" target="_blank"&gt;https://www.getdbt.com/&lt;/A&gt;) - the reason for this was explicitly to make it easy to handle schema migrations in a hive metastore (we can't/won't migrate to unity catalog on a short notice).&lt;/P&gt;&lt;P&gt;What got me sold on DTB (the solution was proposed by someone smarter than me :))&amp;nbsp; was the fact that you can basically instruct it to handle migrations itself in a one-liner:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;+on_schema_change&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"sync_all_columns"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Hope this helps &amp;amp; good luck!&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 05 Aug 2024 14:37:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/81907#M36445</guid>
      <dc:creator>zerobugs</dc:creator>
      <dc:date>2024-08-05T14:37:05Z</dc:date>
    </item>
    <item>
      <title>Re: Schema migration of production delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/93313#M38673</link>
      <description>&lt;P&gt;&lt;A href="https://github.com/liquibase/liquibase-databricks" target="_blank"&gt;GitHub - liquibase/liquibase-databricks&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2024 14:39:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/schema-migration-of-production-delta-tables/m-p/93313#M38673</guid>
      <dc:creator>worlordv</dc:creator>
      <dc:date>2024-10-09T14:39:50Z</dc:date>
    </item>
  </channel>
</rss>

