cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Asset Bundles: Dynamic job cluster insertion in jobs

Cas
New Contributor III

Hi!

As we are migrating from dbx to asset bundles we are running into some problems with the dynamic insertion of job clusters in the job definition as with dbx we did this nicely with jinja and defined all the clusters in one place and a change in the cluster definitions changed automatically all the jobs and there is no need to duplicate code.

With asset bundles i have tried it using variables and with a conf file using the sync option. But nevertheless I can't get it to work, and the cluster part of the job is just empty in every scenario with the conf file. With variables I can't get a multiline variable to be passed.

So i am wondering, what is the way of working to achieve this??

Structure of project:

 

.
ā””ā”€ā”€ bundle/
    ā”œā”€ā”€ resources/
    ā”‚   ā””ā”€ā”€ job.yaml
    ā”œā”€ā”€ conf/
    ā”‚   ā””ā”€ā”€ cluster.yaml
    ā”œā”€ā”€ src/
    ā”‚   ā””ā”€ā”€ test.py
    ā””ā”€ā”€ databricks.yaml

 

 Databricks.yaml:

 

artifacts:
  cluster_file:
    files:
      - source: cluster.yaml
    path: conf
    type: yaml

targets:
    dev:
    mode: production
    default: true
    workspace:
      profile: dev
      host: host.azuredatabricks.net
      root_path: /${bundle.name}/${bundle.git.commit}
      artifact_path: /${bundle.name}/${bundle.git.commit}
    run_as:
      user_name: xxxx
    sync:
      include:
      - conf/

 

 Job.yaml

 

resources:
  jobs:
    BUNDLE_ARTIFACT_TEST:
      name:  ${bundle.target} cluster test
      schedule:
        quartz_cron_expression: 0 30 0 ? * SUN *
        timezone_id: Europe/Amsterdam
        pause_status: UNPAUSED
      tasks:
        - task_key: test_task
          spark_python_task:
            python_file: ../src/test.py
          job_cluster_key: cluster_5_nodes_16gb
          libraries:
            - whl: ../dist/*.whl
      job_clusters:
        ${bundle.name}/${bundle.git.commit}/files/conf/cluster.yaml

 

cluster.yaml:

 

  - job_cluster_key: cluster_5_nodes_16gb
    new_cluster:
      spark_version: 13.3.x-scala2.12
      node_type_id: Standard_D4s_v5
      spark_env_vars:
        DEVOPS_ARTIFACTS_TOKEN: "{{secrets/devops/artifacts}}"
      runtime_engine: PHOTON
      num_workers: 5

 

Thanks in advance!

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Cas
New Contributor III

Tnanks for the reply! Will dive into this, but we would prefer to keep it within the codebase and not sure if this solution will work with the multiline job cluster definitions.

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 
 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.