<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Declarative Automation Bundle - Reusable job_cluster configuration in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156097#M54352</link>
    <description>&lt;P&gt;Hi everyone, quick comment on my Friday post for relevance. I would appreciate any help on this case.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 04 May 2026 16:35:48 GMT</pubDate>
    <dc:creator>ChristianRRL</dc:creator>
    <dc:date>2026-05-04T16:35:48Z</dc:date>
    <item>
      <title>Declarative Automation Bundle - Reusable job_cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/155976#M54336</link>
      <description>&lt;P&gt;Hi there, running into some trouble abstracting job_clusters configurations to improve reusability. At the moment, I have many job yaml files that require the following configuration:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChristianRRL_0-1777669403132.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/26575iDBB638CE350FD329/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ChristianRRL_0-1777669403132.png" alt="ChristianRRL_0-1777669403132.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;What would be the best approach(es) to remove this configuration from every job yaml file? Currently, we already have the following kinds of yaml files:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;base_config.yml&lt;/LI&gt;&lt;LI&gt;databricks.yml&lt;/LI&gt;&lt;LI&gt;Many "job_name".yml&lt;UL&gt;&lt;LI&gt;NOTE: Working version has the job_clusters configuration set for each individual yaml file&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;meta_variables.yml&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I did try creating a new `cluster_definitions.yml` as follows:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# Centralized cluster definitions for all fleet jobs.
# YAML anchors define reusable cluster profiles; each job references them via merge keys.
# DAB deep-merges these job_clusters with the tasks/parameters in individual fleet_*.yml files.

x-cluster-base: &amp;amp;cluster_base
  spark_version: 16.4.x-scala2.12
  spark_conf:
    spark.databricks.cluster.profile: singleNode
    spark.master: "local[*]"
    spark.databricks.optimizer.collapseWindows.enabled: "false"
  node_type_id: Standard_E4ds_v4
  num_workers: 2
  azure_attributes:
    availability: ON_DEMAND_AZURE
    first_on_demand: 1
    spot_bid_max_price: -1
  # spark_env_vars:
    # ...

resources:
  jobs:
    fleet_wtg_ge_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base
    fleet_wtg_ge_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base
    fleet_wtg_sgre_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base
    fleet_wtg_sgre_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base
    fleet_wtg_vestas_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base
    fleet_wtg_vestas_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            &amp;lt;&amp;lt;: *cluster_base&lt;/LI-CODE&gt;&lt;P&gt;But when I tried running the deployment, I got the following error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Error: multiple resources have been defined with the same key: fleet_wtg_sgre_curated
  at jobs.fleet_wtg_sgre_curated
  in cluster_definitions.yml:59:7
     fleet_wtg_sgre_curated.yml:4:7

Error: multiple resources have been defined with the same key: feature_job_compute_cluster_fleet_wtg_ge_silver
  at jobs.feature_job_compute_cluster_fleet_wtg_ge_silver
  in fleet_wtg_ge_silver-v2.yml:4:7
     fleet_wtg_ge_silver-v3.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_vestas_silver
  at jobs.fleet_wtg_vestas_silver
  in cluster_definitions.yml:64:7
     fleet_wtg_vestas_silver.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_vestas_curated
  at jobs.fleet_wtg_vestas_curated
  in cluster_definitions.yml:69:7
     fleet_wtg_vestas_curated.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_ge_silver
  at jobs.fleet_wtg_ge_silver
  in cluster_definitions.yml:44:7
     fleet_wtg_ge_silver.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_ge_curated
  at jobs.fleet_wtg_ge_curated
  in cluster_definitions.yml:49:7
     fleet_wtg_ge_curated.yml:4:7&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Would appreciate some help on this one!&lt;/P&gt;</description>
      <pubDate>Fri, 01 May 2026 21:12:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/155976#M54336</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2026-05-01T21:12:30Z</dc:date>
    </item>
    <item>
      <title>Re: Declarative Automation Bundle - Reusable job_cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156097#M54352</link>
      <description>&lt;P&gt;Hi everyone, quick comment on my Friday post for relevance. I would appreciate any help on this case.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 16:35:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156097#M54352</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2026-05-04T16:35:48Z</dc:date>
    </item>
    <item>
      <title>Re: Declarative Automation Bundle - Reusable job_cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156113#M54358</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My doubt about your issue is happening in&amp;nbsp;cluster_definitions.yml&amp;nbsp;because it is not only defining a reusable cluster profile it is also redefining the same jobs that already exist in the individual fleet_*.yml files.&lt;/P&gt;&lt;P&gt;Why ? because in DBKS asset bundles each entry under:&lt;/P&gt;&lt;PRE&gt;resources:
  jobs:
    &amp;lt;job_key&amp;gt;:&lt;/PRE&gt;&lt;P&gt;must be unique in the final resolved bundle.&lt;/P&gt;&lt;P&gt;So if fleet_wtg_ge_silver exists in fleet_wtg_ge_silver.yml and also in cluster_definitions.yml, the bundle sees 2 resources with the same key and fails.&lt;/P&gt;&lt;P&gt;I&amp;nbsp;tried&amp;nbsp;to&amp;nbsp;replicate&amp;nbsp;your&amp;nbsp;issue&amp;nbsp;and&amp;nbsp;I&amp;nbsp;had&amp;nbsp;that.&lt;/P&gt;&lt;P&gt;DBKS supports splitting bundle configuration across multiple YAML files using include&amp;nbsp;but the included files are combined into one bundle config so you cannot redefine the same top level job resource twice.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Better thing to do is to define the cluster as a complex variable and reference it from each job.&lt;/P&gt;&lt;PRE&gt;# cluster_definitions.yml

variables:
  fleet_job_cluster:
    description: Shared fleet job cluster definition
    type: complex
    default:
      spark_version: 16.4.x-scala2.12
      node_type_id: Standard_E4ds_v4
      num_workers: 2
      azure_attributes:
        availability: ON_DEMAND_AZURE
        first_on_demand: 1
        spot_bid_max_price: -1
      spark_conf:
        spark.databricks.cluster.profile: singleNode
        spark.master: "local[*]"
        spark.databricks.optimizer.collapseWindows.enabled: "false"&lt;/PRE&gt;&lt;P&gt;then in each job file:&lt;/P&gt;&lt;PRE&gt;resources:
  jobs:
    fleet_wtg_ge_silver:
      name: fleet_wtg_ge_silver

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster: ${var.fleet_job_cluster}

      tasks:
        - task_key: Silver
          notebook_task:
            notebook_path: ../src/fleet/wtg_ge/silver/Silver.py
            base_parameters:
              task_name: "{{task.name}}"
            source: WORKSPACE
          job_cluster_key: job_cluster&lt;/PRE&gt;&lt;P&gt;or simply use YAML anchors (but anchors are only practical when the anchor and the usage are in the same YAML document and they are not a good crossfile reuse mechanism for this case)&lt;/P&gt;&lt;P&gt;Also, this part caught my eyes :&lt;/P&gt;&lt;PRE&gt;fleet_wtg_ge_silver-v2.yml
fleet_wtg_ge_silver-v3.yml&lt;/PRE&gt;&lt;P&gt;the bundle include pattern is picking up multiple versions of the same job so try to clean up the include pattern or move old test versions outside the included folder :&lt;/P&gt;&lt;PRE&gt;include:
  - resources/jobs/*.yml
  - resources/common/*.yml&lt;/PRE&gt;&lt;P&gt;and avoid including archived&amp;nbsp; files such as *-v2.yml and *-v3.yml.&lt;/P&gt;&lt;P&gt;So you can do a structure like :&lt;/P&gt;&lt;PRE&gt;databricks.yml
resources/
  common/
    cluster_definitions.yml
  jobs/
    fleet_wtg_ge_silver.yml
    fleet_wtg_ge_curated.yml
    fleet_wtg_sgre_silver.yml&lt;/PRE&gt;&lt;P&gt;with:&lt;/P&gt;&lt;PRE&gt;# databricks.yml
include:
  - resources/common/*.yml
  - resources/jobs/*.yml&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 21:33:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156113#M54358</guid>
      <dc:creator>amirabedhiafi</dc:creator>
      <dc:date>2026-05-04T21:33:15Z</dc:date>
    </item>
    <item>
      <title>Re: Declarative Automation Bundle - Reusable job_cluster configuration</title>
      <link>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156317#M54402</link>
      <description>&lt;P&gt;Using&amp;nbsp;&lt;SPAN&gt;complex variable is the right suggestion. Thank you!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2026 18:47:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/declarative-automation-bundle-reusable-job-cluster-configuration/m-p/156317#M54402</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2026-05-06T18:47:50Z</dc:date>
    </item>
  </channel>
</rss>

