cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Do not deploy all notebook to the given environment

maikel
Contributor II

Hello Community!

what is the best way to avoid deploying some notebooks from the asset bundle to higher environments?
Given I have following resources structure:

resources/
  ├── jobs/
  │   ├── notebook_a.yml
  │   ├── notebook_b.yml          ← dev only
  │   ├── notebook_c.yml          ← dev only
  │   ├── notebook_d.yml
 src/notebooks/
  ├── jobs/
  │   ├── notebook_a.py                      
  │   ├── notebook_b.py       ← dev only
  │   ├── notebook_c.py       ← dev only
  │   └── notebook_d.py

We have three envs: dev, pre-prod and prod.
I would like to avoid deplying notebooks b and c to pre-prod and prod. How to handle it nicely in databricks.yml file?

Thanks a lot!

2 ACCEPTED SOLUTIONS

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @maikel,

Have you considered using target‑scoped resources so that the jobs for specific notebooks only exist in the dev target and are simply not defined for pre-prod and prod?

Databricks Asset Bundles let you put a targets: block inside each resource file, and only the resources listed for the active target are deployed.

See below references... Explains how databricks.yml is structured, and how targets can override or add their own resources. This is the basis for "only define some resources for certain targets."

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @maikel,

Unfortunately, you can’t quite do that. include is only allowed at the top level, not inside targets. Per the bundle config spec, include is a top‑level mapping, and there’s no per‑target variant.

To get debug jobs only in local + dev without redefining everything in databricks.yml, an option would be to keep databricks.yml as

bundle:
  name: example_bundle

include:
  - resources/jobs/*.yml
  - resources/debug/*.yml
 
Then in resources/debug/notebook_debug.yml:
debug_job_def: &debug_job_def
  resources:
    jobs:
      notebook_debug:
        name: notebook_debug_${bundle.target}
        # full job definition here...

targets:
  local:
    <<: *debug_job_def

  dev:
    <<: *debug_job_def
 

This should result in both local and dev having a notebook_debug. It won't be defined in preprod or any other environment. 

You may want to explore the pattern illustrated in the Databricks technical blog on customising target deployments.

Does that answer your question?

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

5 REPLIES 5

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @maikel,

Have you considered using target‑scoped resources so that the jobs for specific notebooks only exist in the dev target and are simply not defined for pre-prod and prod?

Databricks Asset Bundles let you put a targets: block inside each resource file, and only the resources listed for the active target are deployed.

See below references... Explains how databricks.yml is structured, and how targets can override or add their own resources. This is the basis for "only define some resources for certain targets."

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

maikel
Contributor II

Hello @Ashwin_DSA ,
thanks for the quick response! I think this is a good direction!

maikel
Contributor II

Hello @Ashwin_DSA ,

I am trying to implement environment distinction for notebooks and have some questions 🙂
My resources:

resources/
  ├── jobs/
  │   ├── notebook_a.yml
  │   ├── notebook_b.yml
  │   ├── notebook_c.yml          
  │   ├── notebook_d.yml
  ├── debug/
      ├── notebook_debug.yml

And my databricks.yml:

bundle:
name: example_bundle

include:
- resources/jobs/*.yml

targets:
local:
mode: development
default: true
workspace:
host: https://dev-env.url.databricks.com

dev:
mode: production
default: false
workspace:
host: https://dev-env.url.databricks.com
permissions:
- ...

preprod:
mode: production
default: false
workspace:
host: https://preprod-env.url.databricks.com


Example notebook_a.yml:

resources:
jobs:
notebook_a:
name: notebook_a_${bundle.target}
description: some example notebook

environments:
- environment_key: default
spec:
client: "4"
dependencies:
- ../../dist/*.whl

tasks:
- task_key: notebook_a
environment_key: default
notebook_task:
notebook_path: ${workspace.file_path}/src/notebooks/notebook_a.yml
source: WORKSPACE
base_parameters:
example_parameter: "placeholder"

max_concurrent_runs: 1

tags:
...

permissions:
...

I would like to have debug/notebook_debug deployed only to local and dev. Is there a possibility to do so in a shape of databricks.yml file which I have currently without a need to define a whole resource under the specific targets? 

I would like to achieve something like this:

bundle:
  name: example_bundle

include:
  - resources/jobs/*.yml

targets:
  local:
    mode: development
    default: true
    workspace:
      host: https://dev-env.url.databricks.com
    include:
      - resources/debug/*.yml

  dev:
    mode: production
    default: false
    workspace:
      host: https://dev-env.url.databricks.com
    permissions:
      - ...
    include:
      - resources/debug/*.yml

  preprod:
    mode: production
    default: false
    workspace:
      host: https://preprod-env.url.databricks.com

Thanks a lot for the support! 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @maikel,

Unfortunately, you can’t quite do that. include is only allowed at the top level, not inside targets. Per the bundle config spec, include is a top‑level mapping, and there’s no per‑target variant.

To get debug jobs only in local + dev without redefining everything in databricks.yml, an option would be to keep databricks.yml as

bundle:
  name: example_bundle

include:
  - resources/jobs/*.yml
  - resources/debug/*.yml
 
Then in resources/debug/notebook_debug.yml:
debug_job_def: &debug_job_def
  resources:
    jobs:
      notebook_debug:
        name: notebook_debug_${bundle.target}
        # full job definition here...

targets:
  local:
    <<: *debug_job_def

  dev:
    <<: *debug_job_def
 

This should result in both local and dev having a notebook_debug. It won't be defined in preprod or any other environment. 

You may want to explore the pattern illustrated in the Databricks technical blog on customising target deployments.

Does that answer your question?

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

maikel
Contributor II

This is perfect! Thank you very much @Ashwin_DSA !