cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Parametrizing queries in DAB deployments

malterializedvw
New Contributor II

Hi folks,
I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs. production deployments.
A well working example are notebook tasks or DLT jobs: I can pass all sorts of parameters. I can not only parametrize the environment (dev/test/prod) but also database objects (e.g. tables) with an identifier for [dev]-deployments (to not interfere with the regular database objects).
The problem now arises with Mat Views. For those I must use a SQL warehouse, so I cant use the existing tricks. Neither can I use a (parametrizable) Databricks Query object, as those are not commitable into repos. I am currently using a sql_task on a flat file which contains a hard coded query (s. below).
Using the power of sed in the deployment process, I am indeed able to handle the environment. But [dev] deployments are not handled by that and I secretly hope for a better solution than using sed.
Is there?
Thanks and best regards
```

        - task_keytask_example
          sql_task:
            file:
              path/Workspace/${var.path}/files/query.sql
              sourceWORKSPACE
            warehouse_idabc123
```
2 REPLIES 2

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @malterializedvw,

You don’t need Databricks Query objects for this. Have you considered using bundle variables + SQL task parameters on the .sql file? Check this documentation.

The good thing is that sql_task supports a parameters map. Any key you define there can be referenced in your SQL file as {{parameter_key}} and will be substituted by Databricks at run time.

As an example, you do something like this in your query.sql.. 

CREATE OR REPLACE MATERIALIZED VIEW
  {{catalog}}.{{schema}}.orders_mv{{table_suffix}}
AS
SELECT *
FROM {{catalog}}.{{schema}}.orders{{table_suffix}};

That way, you can use the same sql in multiple environments without havng to use sed. 

Environment differences can handled via targets + variable overrides.

Does this help?

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Pat
Esteemed Contributor

Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?

Use bundle substitutions/variables inside the SQL file itself

Databricks Asset Bundles support variable substitution not only in YAML but also in SQL files referenced by tasks, via the same {{var.name}} syntax. The pattern:
databricks.yml:

bundle:
  name: my-bundle

variables:
  env:
    default: dev
  table_suffix:
    default: _dev

targets:
  dev:
    default: true
    variables:
      env: dev
      table_suffix: _dev
  prod:
    variables:
      env: prod
      table_suffix: ""

SQL:

CREATE OR REPLACE MATERIALIZED VIEW IDENTIFIER(
  :env || '_gold.schema_name.mv_name'
) AS
SELECT *
FROM IDENTIFIER(
  :env || '_silver.schema_name.src_table_name'
);

job.yml:

resources:
  jobs:
    mv-job:
      name: mv-job
      tasks:
        - task_key: mv_task
          sql_task:
            file:
              path: ./sql/mv_example.sql   # committed in repo
              source: WORKSPACE
            warehouse_id: abc123
            parameters:
              env: dev