cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parametrizing queries in DAB deployments

malterializedvw
New Contributor III

Hi folks,
I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs. production deployments.
A well working example are notebook tasks or DLT jobs: I can pass all sorts of parameters. I can not only parametrize the environment (dev/test/prod) but also database objects (e.g. tables) with an identifier for [dev]-deployments (to not interfere with the regular database objects).
The problem now arises with Mat Views. For those I must use a SQL warehouse, so I cant use the existing tricks. Neither can I use a (parametrizable) Databricks Query object, as those are not commitable into repos. I am currently using a sql_task on a flat file which contains a hard coded query (s. below).
Using the power of sed in the deployment process, I am indeed able to handle the environment. But [dev] deployments are not handled by that and I secretly hope for a better solution than using sed.
Is there?
Thanks and best regards
```

        - task_keytask_example
          sql_task:
            file:
              path/Workspace/${var.path}/files/query.sql
              sourceWORKSPACE
            warehouse_idabc123
```
8 REPLIES 8

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @malterializedvw,

You donโ€™t need Databricks Query objects for this. Have you considered using bundle variables + SQL task parameters on the .sql file? Check this documentation.

The good thing is that sql_task supports a parameters map. Any key you define there can be referenced in your SQL file as {{parameter_key}} and will be substituted by Databricks at run time.

As an example, you do something like this in your query.sql.. 

CREATE OR REPLACE MATERIALIZED VIEW
  {{catalog}}.{{schema}}.orders_mv{{table_suffix}}
AS
SELECT *
FROM {{catalog}}.{{schema}}.orders{{table_suffix}};

That way, you can use the same sql in multiple environments without havng to use sed. 

Environment differences can handled via targets + variable overrides.

Does this help?

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Hi Ashwin, this approach wouldn't work for me due to the apostrophe injected. Sorry for the bad formatting, I am running into errors trying to send it properly.

- task_key: example 
sql_task:
file:
path: query.sql
source: WORKSPACE
warehouse_id: abc
parameters:
catalog: data_catalog_${var.environment}
schema: source_schema
table: mat_view
CREATE MATERIALIZED VIEW IF NOT EXISTS {{catalog}}.{{schema}}.{{table}}โ€‹
[PARSE_SYNTAX_ERROR] Syntax error 
CREATE MATERIALIZED VIEW IF NOT EXISTS
'data_catalog_PROD'.'source_schema'.'mat_view'

Pat
Esteemed Contributor

Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?

Use bundle substitutions/variables inside the SQL file itself

Databricks Asset Bundles support variable substitution not only in YAML but also in SQL files referenced by tasks, via the same {{var.name}} syntax. The pattern:
databricks.yml:

bundle:
  name: my-bundle

variables:
  env:
    default: dev
  table_suffix:
    default: _dev

targets:
  dev:
    default: true
    variables:
      env: dev
      table_suffix: _dev
  prod:
    variables:
      env: prod
      table_suffix: ""

SQL:

CREATE OR REPLACE MATERIALIZED VIEW IDENTIFIER(
  :env || '_gold.schema_name.mv_name'
) AS
SELECT *
FROM IDENTIFIER(
  :env || '_silver.schema_name.src_table_name'
);

job.yml:

resources:
  jobs:
    mv-job:
      name: mv-job
      tasks:
        - task_key: mv_task
          sql_task:
            file:
              path: ./sql/mv_example.sql   # committed in repo
              source: WORKSPACE
            warehouse_id: abc123
            parameters:
              env: dev

malterializedvw
New Contributor III

Hey @Pat , thanks for the answer, I didnt know about the IDENTIFIER-clause and it makes me hopeful.
Yes, you got me right. So, I tried to apply your suggestion to my problem, defining catalog, schema and table in the job.yml and then updating my query.sql as follows:

CREATE MATERIALIZED VIEW IF NOT EXISTS IDENTIFIER(
  :catalog || :schema || :table
)

Also I tried this:

:catalog || '.' || :schema || '.' || :table

Neither worked though.
Exactly one CREATE [ LIVE TABLE | MATERIALIZED VIEW | STREAMING TABLE ] statement expected, but 0 found

Hi @malterializedvw,

Try this... Build your .sql file like the below,

CREATE MATERIALIZED VIEW IF NOT EXISTS
  IDENTIFIER('{{catalog}}.{{schema}}.{{table}}')
AS
SELECT
  ...
FROM
  ...;

 

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Hi @Ashwin_DSA  , this does not work either. 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'catalog'. SQLSTATE: 42601 (line 1, pos 52) == SQL == CREATE MATERIALIZED VIEW IF NOT EXISTS IDENTIFIER(''catalog.schema.table'')

malterializedvw
New Contributor III

The best I could come up if after some back and forth with Claude is this:

DECLARE target_fqn STRING;
DECLARE source_fqn STRING;
DECLARE q STRING;
DECLARE sql_stmt STRING;
SET VAR target_fqn = :target_fqn;
SET VAR source_fqn = :source_fqn;
SET VAR q = char(39);
SET VAR sql_stmt = CONCAT(
  'CREATE MATERIALIZED VIEW IF NOT EXISTS ', target_fqn,
  ' TBLPROPERTIES (', q, 'delta.columnMapping.mode', q, ' = ', q, 'name', q, ')',
  ' TRIGGER ON UPDATE',
  ' AS',
  ' select',
  '   cast(abc as TIMESTAMP) as `A B C`,',
  '   cast(xyz as TIMESTAMP) as `X Y Z`,',

  ' from ', source_fqn
);
EXECUTE IMMEDIATE sql_stmt


There must be a better way in Databricks?๐Ÿค”

-werners-
Esteemed Contributor III

Hm, the IDENTIFIER({{var}} || string) should work for create statements with DAB.
I also spent way too much time on AI giving me wrong answers (Jinja templating format on the first place).
Mind that there are no spaces in {{var}}.

BUT there are some limitations, f.e. in defining FKs f.e. (REFERENCES). That does not work (yet).
Dunno if this helps