cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I structure pipeline-specific job params separately in Databricks Asset Bundle.

azam-io
New Contributor II

Hi all, 
I am working with databricks asset bundle and want to separate environment-specific job params (for example, for "env" and "dev") for each pipeline within my bundle. I need each pipeline to have its own job params values for different environments in separate files, rather than defining them inside the job yaml file itself.

4 REPLIES 4

FedeRaimondi
Contributor

Hey @azam-io , you can define the variables in you databricks.yml file for each target (you can define several for each env).

bundle:
  name: your-name
  uuid: id

include:
  - resources/jobs/*.yml
  - resources/experiments/*.yml
  - resources/dashboards/*.yml
  - resources/clusters/*.yml

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: "host-link-dev"
      root_path: "path"
    run_as:
      service_principal_name: spn
    variables:
      catalog: dev_catalog
      schema: dev_schema

  prod:
    mode: production
    workspace:
      host: "host-link-prd"
      root_path: "path"
    run_as:
      service_principal_name: spn
    variables:
      catalog: prd_catalog
      schema: prd_schema

variables:
  catalog:
    description: Catalog name.
    default: dev_catalog
  schema:
    description: Schema name.
    default: dev_schema

Then, in your pipelines or other resources yml simply refer to the variables with:

${var.catalog}

 

Hi Fede, thanks for your response. What Iโ€™m aiming for is to keep the variables separated by job and, within each job, by environment. For example, I envision a folder structure under the resources directory where each job has its own folder, and inside that folder, there are separate files for the main job, development parameters, and production parameters, etc.

hello @azam-io ,

from what I know variables need to be defined in the databricks.yml file (never tried otherwise to be fair). Since you still want your variables to be environment dependent, I believe there are a few options.

One could be using dotenv files, or pointing at some other configurations (maybe in volumes) where you can store the parameters and you read the file in your job.

Or keeping the structure you envision: define all the variables for all your jobs, maybe you can leverage complex variables:

variables:
  job_x_params:
    description: 'My job params'
    type: complex
    default:
      param1: 'value1'
      param2: 'value2'
      param3:
        param3.1: true
        param3.2: false

Then you can store a variable-overrides.json file for each environment. There's an example of this implementation in this other thread: Solved: Re: How to use variable-overrides.json for environ... - Databricks Community - 125126

In my view I think it can be quite hard to manage if the number of jobs and parameters increases... Probably storing job parameters in configuration files would be cleaner, and the asset bundles variables can be the path to those files.

Hope this could help, otherwise maybe could you share some parameters example?

Michaล‚
New Contributor II

Hi azam-io, were you able to solve your problem? 

Are you trying to have different parameters depending on the environment, or a different parameter value? 
I think the targets would allow to specify different parameters per environment / target. 

As for the parameter values, I have solved this problem by using variables. I have a config file which I read before running databricks cli, this converts configured values into environmental variables, and I use them to set variables when executing `databricks bundle`. That, unfortunately, very quickly becomes a lot to do and difficult not to miss things, but to solve that I wrapped it all in a Drone CI pipeline. Now I can run the deployment locally with a single command which behind the scenes runs validation, deployment, and on top of that a few extra steps - for example in dev deployment it automatically starts the pipeline.