3 weeks ago
Hi folks,
I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs. production deployments.
A well working example are notebook tasks or DLT jobs: I can pass all sorts of parameters. I can not only parametrize the environment (dev/test/prod) but also database objects (e.g. tables) with an identifier for [dev]-deployments (to not interfere with the regular database objects).
The problem now arises with Mat Views. For those I must use a SQL warehouse, so I cant use the existing tricks. Neither can I use a (parametrizable) Databricks Query object, as those are not commitable into repos. I am currently using a sql_task on a flat file which contains a hard coded query (s. below).
Using the power of sed in the deployment process, I am indeed able to handle the environment. But [dev] deployments are not handled by that and I secretly hope for a better solution than using sed.
Is there?
Thanks and best regards
```
3 weeks ago
Hi @malterializedvw,
You donโt need Databricks Query objects for this. Have you considered using bundle variables + SQL task parameters on the .sql file? Check this documentation.
The good thing is that sql_task supports a parameters map. Any key you define there can be referenced in your SQL file as {{parameter_key}} and will be substituted by Databricks at run time.
As an example, you do something like this in your query.sql..
CREATE OR REPLACE MATERIALIZED VIEW
{{catalog}}.{{schema}}.orders_mv{{table_suffix}}
AS
SELECT *
FROM {{catalog}}.{{schema}}.orders{{table_suffix}};
That way, you can use the same sql in multiple environments without havng to use sed.
Environment differences can handled via targets + variable overrides.
Does this help?
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
2 weeks ago
Hi Ashwin, this approach wouldn't work for me due to the apostrophe injected. Sorry for the bad formatting, I am running into errors trying to send it properly.
- task_key: example
sql_task:
file:
path: query.sql
source: WORKSPACE
warehouse_id: abc
parameters:
catalog: data_catalog_${var.environment}
schema: source_schema
table: mat_view
CREATE MATERIALIZED VIEW IF NOT EXISTS {{catalog}}.{{schema}}.{{table}}โ[PARSE_SYNTAX_ERROR] Syntax error
CREATE MATERIALIZED VIEW IF NOT EXISTS
'data_catalog_PROD'.'source_schema'.'mat_view'
3 weeks ago
Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?
Use bundle substitutions/variables inside the SQL file itself
Databricks Asset Bundles support variable substitution not only in YAML but also in SQL files referenced by tasks, via the same {{var.name}} syntax. The pattern:
databricks.yml:
bundle:
name: my-bundle
variables:
env:
default: dev
table_suffix:
default: _dev
targets:
dev:
default: true
variables:
env: dev
table_suffix: _dev
prod:
variables:
env: prod
table_suffix: ""SQL:
CREATE OR REPLACE MATERIALIZED VIEW IDENTIFIER(
:env || '_gold.schema_name.mv_name'
) AS
SELECT *
FROM IDENTIFIER(
:env || '_silver.schema_name.src_table_name'
);job.yml:
resources:
jobs:
mv-job:
name: mv-job
tasks:
- task_key: mv_task
sql_task:
file:
path: ./sql/mv_example.sql # committed in repo
source: WORKSPACE
warehouse_id: abc123
parameters:
env: dev
2 weeks ago
Hey @Pat , thanks for the answer, I didnt know about the IDENTIFIER-clause and it makes me hopeful.
Yes, you got me right. So, I tried to apply your suggestion to my problem, defining catalog, schema and table in the job.yml and then updating my query.sql as follows:
CREATE MATERIALIZED VIEW IF NOT EXISTS IDENTIFIER(
:catalog || :schema || :table
)
Also I tried this:
:catalog || '.' || :schema || '.' || :table
Neither worked though.
Exactly one CREATE [ LIVE TABLE | MATERIALIZED VIEW | STREAMING TABLE ] statement expected, but 0 found
a week ago
Hi @malterializedvw,
Try this... Build your .sql file like the below,
CREATE MATERIALIZED VIEW IF NOT EXISTS
IDENTIFIER('{{catalog}}.{{schema}}.{{table}}')
AS
SELECT
...
FROM
...;
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
a week ago
Hi @Ashwin_DSA , this does not work either.
[PARSE_SYNTAX_ERROR] Syntax error at or near 'catalog'. SQLSTATE: 42601 (line 1, pos 52) == SQL == CREATE MATERIALIZED VIEW IF NOT EXISTS IDENTIFIER(''catalog.schema.table'')
a week ago - last edited a week ago
The best I could come up if after some back and forth with Claude is this:
DECLARE target_fqn STRING;
DECLARE source_fqn STRING;
DECLARE q STRING;
DECLARE sql_stmt STRING;
SET VAR target_fqn = :target_fqn;
SET VAR source_fqn = :source_fqn;
SET VAR q = char(39);
SET VAR sql_stmt = CONCAT(
'CREATE MATERIALIZED VIEW IF NOT EXISTS ', target_fqn,
' TBLPROPERTIES (', q, 'delta.columnMapping.mode', q, ' = ', q, 'name', q, ')',
' TRIGGER ON UPDATE',
' AS',
' select',
' cast(abc as TIMESTAMP) as `A B C`,',
' cast(xyz as TIMESTAMP) as `X Y Z`,',
' from ', source_fqn
);
EXECUTE IMMEDIATE sql_stmt
There must be a better way in Databricks?๐ค
a week ago
Hm, the IDENTIFIER({{var}} || string) should work for create statements with DAB.
I also spent way too much time on AI giving me wrong answers (Jinja templating format on the first place).
Mind that there are no spaces in {{var}}.
BUT there are some limitations, f.e. in defining FKs f.e. (REFERENCES). That does not work (yet).
Dunno if this helps