Re: How to deploy Azure Databricks Sql (Table, Vie...

mmlime · ‎10-21-2022

Hi,

I agree with @Josef Prakljacic . If Databricks would like to compete SQL DWH/Synapse or Snowflake and target DWH users, It should prepare some guidelines how to manage "database" objects.

Yea, @Werner Stinckens with Engineering workspace and pySpark style notebooks you create your object directly from write statements. But now, with Delta, Delta Lake and mainly Identity columns you need to create your tables ahead, when you are creating your environment.

And sorry, but it is not only about creating tables. Later in production when tables are populated you maybe wants to rename column, or some other alter table operation. And in the best scenario you would like to use descriptive language (DDL) to do that and test it in lower environments. (DB Project for SQL Server is great example).

So lets discuss, how do you solve this in your production?

use case.:

I have a table `Customer` with 10000 records already in production. I would like to change column name `surname` to `lastName`.

I have std. dev flow. DEV env -> TEST -> STAGE -> PROD.

How to prepare change in DEV, test in in TEST a and propagate to PROD?

I can start:

I have customer.sql file with table DDL.

I will add there a command to ALTER TABLE.

I run this SQL at the beginning of CD pipeline.

I will test my pipelines after change in TEST and if succeed deploy to PROD.

OK fine greate. But now I have ALTER TABLE cmd in my SQL file and it fail with next CD pipeline run, cause `surname` column does not more exist.

How did you solve this @Josef Prakljacic ?

Thanks.