cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Variables in databricks.yml "include:" - Asset Bundles

nickneoners
New Contributor II

HI,

We've got an app that we deploy to multiple customers workspaces. 

We're looking to transition to asset bundles. We would like to structure our resources like:

 

 

-src/
-resources/
|-- customer_1/
   |-- job_1
   |-- job_2
|-- customer_2/
   |-- job_3
   |-- job_4
|-- customer_3/
   |-- job_5
   |-- job_6
- databricks.yml

 

 

so each customer only gets their specific workflows.

In `databricks.yml` would love to do:

 

 

bundle:
  name: my_app

include: 
  - resources/${bundle.target}/*.yml

 

 

but unfortunately it seems you can't pass any variables into the `include` block.

Any ideas?

Thanks!

10 REPLIES 10

nickneoners
New Contributor II

Just to add to the above, we've got about 30-40 jobs per customer, so we don't want to define these jobs in the `databricks.yml`, which seems to be the only way to navigate around this issue

p4pratikjain
Contributor

Interesting use case!! Ideally having seperate bundle for each customer seems like a clean solution. But if you dont want that then - 
You can just include all the yaml files in databricks.yml with 

include: 
  - resources/*/*.yml

 Inside the yaml files handle different workspaces under different targets ? target is a top level node which is merged so you can have 'customer_1', 'customer_2' as different targets ?

This is not the intended use of targets but I guess this will solve the ask.

Pratik Jain

pietern
Databricks Employee
Databricks Employee

If the jobs you're defining per customer are completely different and don't share anything (e.g. some base configuration), then using targets for this purpose is not a great fit.

If you're looking to share files between these bundles (maybe some notebooks, maybe a wheel build, maybe some Python files, etc), then you could look into this example: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/share_files_across_bundles

This enables you do define a separate and isolated bundle per customer, with a different deployment schedule for each one of them, but still share the same set of files between these deployments.

_escoto
New Contributor II

Actually, while @nickneoners is using this for customers. We need this exact same functionality to safeguard environments. We have workflows, such as integration tests, or pipelines merely in development, that we don't want to deploy to either Staging or Production. Therefore, we need this functionality as well.

Breno_Ribeiro
New Contributor II

I have a similar use case. We have two different host for databricks, EU and NA. In some case we need to deploy a similar job in both hosts. To fix that, here how I did:
- Into job folder I created different job files, each one for one host. In aditional I created an empty job file, named job.yml

- src/
- jobs/
     |--- job_EU.yml (job configuration for EU dbx)
     |--- job_NA.yml (job configuration for NA dbx)
     |--- job.yml (empty file)

- In databricks.yml I include only the empty file:

include:
    - job.yml

- Finally, at workflow file, in each job for databricks commands I edit the empty file pasting the content of the specific job file, depending on the host I want to deploy and run. This step must be done in each job that calls a databricks bundle command. If they are at the same job, just needed once:

deploy:
   .
   .
   .
   steps:
      .
      .
      .
      - run: cat jobs/job_EU.yml > jobs/job.yml
      - run: databricks bundle deploy

 

 

Hi Breno,

If the jobs you're deploying are similar, it sounds like a good fit for target overrides. You'd define the core of the job just one, and specialize it on a per-target basis. For example, you could define a "eu" and "na" target, each pointing to their own workspace, and specialize the job in the target overrides section. You can find an example on how to do this for job clusters here: https://docs.databricks.com/en/dev-tools/bundles/cluster-override.html#example-2-conflicting-new-job... . All job properties can be overridden in a similar way.

_escoto
New Contributor II

@pietern, while @nickneoners is using this for customers. We need this exact same functionality to safeguard environments. We have workflows, such as integration tests, or pipelines merely in development, that we don't want to deploy to either Staging or Production. Therefore, we need this functionality as well.

pietern
Databricks Employee
Databricks Employee

@_escoto You can achieve this by defining the entire job in the "targets" section for your development target. Resources defined in the top-level "resources" block are deployed to all targets. Resources defined both in the top-level "resources" block _and_ in the "targets" block are merged, where the configuration specified in the target override takes precedence. Resources defined exclusively in the "targets" block are deployed only to that target.

_escoto
New Contributor II

Uhh i would have to have a deep look at that set-up... we have over 50 workflows and some of them reached the maximum number of tasks allowed per workflow... so, i guess if this fixes the problem maybe it's ok... but i think having a more streamlined solution would be better...

Because now i am thinking, what happens if i want to promote a workflow? do i have to copy paste to the other target? meaning code/configuration duplication? can i have it deployed to dev/stg/prd at the same time of it becomes environment dependent and the moment i promote a workflow now i can work on development anymore? 

Meanwhile i created this feature request.
[Feature Request] DABs: allow target specific includes · Issue #2878 · databricks/cli

pietern
Databricks Employee
Databricks Employee

To have a job both in development and production, we continue to recommend defining at the top level of the configuration, under the "resources" field. What you're looking for, which, if I understand correctly, is to deploy a job only to dev+stg but not prod, or to stg+prod but not dev, is not possible out of the box.

Today, you can either opt for using YAML anchors to duplicate the job across the targets you select, or to live with the additional job in a target where you don't want it, and then not run it or pause the schedule.

Thanks for creating the issue. This is feedback I have seen more, and we'll discuss internally to see if and how we can accommodate this.