cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

use include property specified for a particular workspace using DABs

jeremy98
New Contributor III

Hello, community,
Is there a field in the YAML file used with DABs to specify files based on the workspace in use? For example, if I want to deploy notebooks and workflows for staging, they need to be a set of resources that differ from those in production.

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @jeremy98 ,

Unfortunately, you cannot use variables in include mapping. What I was trying to suggest, you should place   resources with "common logic" shared across all environments in your yml file that you're passing to include mapping. Then environment specific settings/workflows should be overriden/added in targets mapping of indvidual environments.

For example, have a look at below example 1 from documentation. They defined some job in top-level resources mapping (it could come from include mapping, doesn't matter for sake of an example).
Next, for development environment they added some specific configuration only for dev, basically overriding "core" job logic that is coming from top-level resources. But also in targets mapping you can add completly new job definition that will only be deployed to specific environment.
So, when a target mapping specifies a workspace, artifacts, or resources mapping, and a top-level workspace, artifacts, or resources mapping also exists, then any conflicting settings are overridden by the settings within the target,

szymon_dybczak_0-1732733291352.png

PS: The problem you're dealing with is quite common. It would be great if databricks team add ability to use variables in include mapping or allow to override include mapping in targets mapping.
Anyway, you can also take a look at below threads and try different approaches. Maybe you can use sync mapping in a clever way?

Variables in databricks.yml "include:" - Asset Bun... - Databricks Community - 78893
Databricks Bundles - How to select which jobs reso... - Databricks Community - 62023

View solution in original post

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

Yes, you can specify different sets of resources for different environments (such as staging and production) in the YAML file used with Databricks Asset Bundles (DABs). This is achieved using the targets mapping in the databricks.yml file.

https://docs.databricks.com/en/dev-tools/bundles/settings.html#targets 

jeremy98
New Contributor III

Thanks for your response,
I mean actually in databricks.yaml I have this declaration:

include:
#- resources/sync_delta_and_db.job.yml
- resources/sync_data_from_prod_to_staging.job.yml

I don't want to deploy also the first resource in staging environment because it is a workflow strictly needed in production and not in staging... so how can I exclude it without comment that line of code?

Hi @jeremy98 ,

It should be done in a way @Walter_C suggested. Define the core of the workflow just once, and specialize it on a per-target basis. For example, you could define a "dev" and "test" target, each pointing to their own workspace, and specialize the job in the target overrides section. 

Hello!
You mean do in this way:

include:
 - resources/${bundle.target}/*.yml

I'm not sure that I understood the point, can you show me a little snippet of the code? Thanks a lot for the answers guys!

Hi @jeremy98 ,

Unfortunately, you cannot use variables in include mapping. What I was trying to suggest, you should place   resources with "common logic" shared across all environments in your yml file that you're passing to include mapping. Then environment specific settings/workflows should be overriden/added in targets mapping of indvidual environments.

For example, have a look at below example 1 from documentation. They defined some job in top-level resources mapping (it could come from include mapping, doesn't matter for sake of an example).
Next, for development environment they added some specific configuration only for dev, basically overriding "core" job logic that is coming from top-level resources. But also in targets mapping you can add completly new job definition that will only be deployed to specific environment.
So, when a target mapping specifies a workspace, artifacts, or resources mapping, and a top-level workspace, artifacts, or resources mapping also exists, then any conflicting settings are overridden by the settings within the target,

szymon_dybczak_0-1732733291352.png

PS: The problem you're dealing with is quite common. It would be great if databricks team add ability to use variables in include mapping or allow to override include mapping in targets mapping.
Anyway, you can also take a look at below threads and try different approaches. Maybe you can use sync mapping in a clever way?

Variables in databricks.yml "include:" - Asset Bun... - Databricks Community - 78893
Databricks Bundles - How to select which jobs reso... - Databricks Community - 62023

Thanks for the answer, very helpful! Basically, I did actually at job running passing the variable of the environment used and raise an error if the environment is not the proper one :(, I'll check which could be another better solution. Because, basically it is possible to create inside the project two folders env specified, but this means that in target will have only one workspace

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group