Asset Bundles: Dynamic job cluster insertion in jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā01-15-2024 02:15 AM
Hi!
As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in the cluster definitions changed automatically all the jobs and there is no need to duplicate code.
With asset bundles i have tried it using variables and with a conf file using the sync option. But nevertheless I can't get it to work, and the cluster part of the job is just empty in every scenario with the conf file. With variables I can't get a multiline variable to be passed.
So i am wondering, what is the way of working to achieve this??
Structure of project:
.
āāā bundle/
āāā resources/
ā āāā job.yaml
āāā conf/
ā āāā cluster.yaml
āāā src/
ā āāā test.py
āāā databricks.yaml
Databricks.yaml:
artifacts:
cluster_file:
files:
- source: cluster.yaml
path: conf
type: yaml
targets:
dev:
mode: production
default: true
workspace:
profile: dev
host: host.azuredatabricks.net
root_path: /${bundle.name}/${bundle.git.commit}
artifact_path: /${bundle.name}/${bundle.git.commit}
run_as:
user_name: xxxx
sync:
include:
- conf/
Job.yaml
resources:
jobs:
BUNDLE_ARTIFACT_TEST:
name: ${bundle.target} cluster test
schedule:
quartz_cron_expression: 0 30 0 ? * SUN *
timezone_id: Europe/Amsterdam
pause_status: UNPAUSED
tasks:
- task_key: test_task
spark_python_task:
python_file: ../src/test.py
job_cluster_key: cluster_5_nodes_16gb
libraries:
- whl: ../dist/*.whl
job_clusters:
${bundle.name}/${bundle.git.commit}/files/conf/cluster.yaml
cluster.yaml:
- job_cluster_key: cluster_5_nodes_16gb
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D4s_v5
spark_env_vars:
DEVOPS_ARTIFACTS_TOKEN: "{{secrets/devops/artifacts}}"
runtime_engine: PHOTON
num_workers: 5
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā01-17-2024 12:28 AM
Tnanks for the reply! Will dive into this, but we would prefer to keep it within the codebase and not sure if this solution will work with the multiline job cluster definitions.

