Databricks Community

dpc · ‎09-23-2024

Hello

We have a development, a test and a production environment

How do you generally deploy DDL changes?

So, alter a table in development and apply to test then production

e.g.

table1 has column1, column2, column3

I add column4

I now want to deploy this change to production

I also want to retain data

Thanks.

filipniziol · ‎09-23-2024

Hi @dpc ,

Managing DDL changes accross environments (dev, test, prod) will be a part of your CI/CD pipeline.
Create a Deployments folder, with subfolders. Each deployment will be represented by a timestamped folder. Example folder structure:

/Deployments
    /Deployments_20240924
        /01_add_column4_table1.sql
        /02_alter_other_table.sql
    /Deployments_20240930
        /01_create_new_table.sql
        /02_update_table1.sql

Inside the folder, each script (e.g., 01_add_column4_table1.sql) represents individual DDL changes (like adding a column or altering a table).
You can then integrate your CI/CD pipeline with the deployment folder
1. Add deployment folder name to your CI/CD pipeline
2. After your notebooks are deployed, add to the pipeline the step to run the Notebooks that are inside the provided deployment pipeline
Alternatively, for every deployment you can create a deployment Job. Each job will have multiple tasks, where each task represent a specific DDL change. In the example, you would create Deployment_20240924 job, that would have 2 steps: to run 01_add_column4_table1.sql and then to run 02_alter_other_table.sql.
1. Running the job could also be a step in your CI/CD pipeline
2. Alternatively, the job could already be scheduled to run at a given time (but you need to know when you deployment is going to happen
3. Alternatively, the job could be run manually after the CI/CD pipeline finishes deploying the notebooks