Databricks asset bundles use cluster depending on target (environment) is possible?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2023 02:49 AM
Here is my bundle definition
Spoiler
# This is a Databricks asset bundle definition for my_project.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
experimental:
python_wheel_wrapper: true
bundle:
name: my_project
include:
- resources/*.yml
targets:
# The 'dev' target, used for development purposes.
# Whenever a developer deploys using 'dev', they get their own copy.
dev:
# We use 'mode: development' to make sure everything deployed to this target gets a prefix
# like '[dev my_user_name]'. Setting this mode also disables any schedules and
# automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
mode: development
default: true
compute_id: xxxxx-yyyyyyyy-zzzzzzz
workspace:
# Optionally, there could be a 'staging' target here.
# (See Databricks docs on CI/CD at https://docs.databricks.com/dev-tools/bundles/index.html.)
#
# staging:
# workspace:
# The 'prod' target, used for production deployment.
prod:
# For production deployments, we only have a single copy, so we override the
# workspace.root_path default of
# /Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
# to a path that is not specific to the current user.
mode: production
workspace:
root_path: /Shared/.bundle/prod/${bundle.name}
run_as:
# This runs as gonzalomoran@ppg.com in production. Alternatively,
# a service principal could be used here using service_principal_name
# (see Databricks documentation).
user_name: gonzalomoran@ppg.com
My user has no rights to create new cluster but job definition tries to create a new one
Spoiler
# The main job for my_project
resources:
jobs:
my_project_job:
name: my_project_job
schedule:
quartz_cron_expression: '44 37 8 * * ?'
timezone_id: Europe/Amsterdam
email_notifications:
on_failure:
- gonzalomoran@ppg.com
tasks:
- task_key: notebook_task
job_cluster_key: job_cluster
notebook_task:
notebook_path: ../src/notebook.ipynb
- task_key: refresh_pipeline
depends_on:
- task_key: notebook_task
pipeline_task:
pipeline_id: ${resources.pipelines.my_project_pipeline.id}
- task_key: main_task
depends_on:
- task_key: refresh_pipeline
job_cluster_key: job_cluster
python_wheel_task:
package_name: my_project
entry_point: main
libraries:
# By default we just include the .whl file generated for the my_project package.
# for more information on how to add other libraries.
- whl: ../dist/*.whl
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D3_v2
autoscale:
min_workers: 1
max_workers: 4
Do you know how to use the defined cluster per defined target?
Regards
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2023 02:59 AM
Hi @Retired_mod,
Is it possible that you provide a minimal working example for option 1 or option 3?
I currently have a test jop:
"""
resources:
jobs:
my_project_job: #my_project_job_${bundle.target}
name: Asset-bundle-test-job-${bundle.target}
schedule:
quartz_cron_expression: '44 37 8 * * ?'
timezone_id: Europe/Amsterdam
tasks:
- task_key: notebook_task
existing_cluster_id: ${var.my_existing_cluster}
notebook_task:
notebook_path: ../src/notebook_${bundle.target}_test.ipynb
"""
with
"""
variables:
my_existing_cluster:
desciption: Id of my existing Cluster
default: 12345_my_id
"""
and i want to use a different cluster in prod and dev, however, the job that should be executed should remain the same.
Any ideas how i can solve this issue?
Any ideas how i can solve this issue?

