cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Declarative Automation Bundle - Reusable job_cluster configuration

ChristianRRL
Honored Contributor

Hi there, running into some trouble abstracting job_clusters configurations to improve reusability. At the moment, I have many job yaml files that require the following configuration:

ChristianRRL_0-1777669403132.png

What would be the best approach(es) to remove this configuration from every job yaml file? Currently, we already have the following kinds of yaml files:

  • base_config.yml
  • databricks.yml
  • Many "job_name".yml
    • NOTE: Working version has the job_clusters configuration set for each individual yaml file
  • meta_variables.yml

I did try creating a new `cluster_definitions.yml` as follows:

# Centralized cluster definitions for all fleet jobs.
# YAML anchors define reusable cluster profiles; each job references them via merge keys.
# DAB deep-merges these job_clusters with the tasks/parameters in individual fleet_*.yml files.

x-cluster-base: &cluster_base
  spark_version: 16.4.x-scala2.12
  spark_conf:
    spark.databricks.cluster.profile: singleNode
    spark.master: "local[*]"
    spark.databricks.optimizer.collapseWindows.enabled: "false"
  node_type_id: Standard_E4ds_v4
  num_workers: 2
  azure_attributes:
    availability: ON_DEMAND_AZURE
    first_on_demand: 1
    spot_bid_max_price: -1
  # spark_env_vars:
    # ...

resources:
  jobs:
    fleet_wtg_ge_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base
    fleet_wtg_ge_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base
    fleet_wtg_sgre_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base
    fleet_wtg_sgre_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base
    fleet_wtg_vestas_silver:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base
    fleet_wtg_vestas_curated:
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            <<: *cluster_base

But when I tried running the deployment, I got the following error:

Error: multiple resources have been defined with the same key: fleet_wtg_sgre_curated
  at jobs.fleet_wtg_sgre_curated
  in cluster_definitions.yml:59:7
     fleet_wtg_sgre_curated.yml:4:7

Error: multiple resources have been defined with the same key: feature_job_compute_cluster_fleet_wtg_ge_silver
  at jobs.feature_job_compute_cluster_fleet_wtg_ge_silver
  in fleet_wtg_ge_silver-v2.yml:4:7
     fleet_wtg_ge_silver-v3.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_vestas_silver
  at jobs.fleet_wtg_vestas_silver
  in cluster_definitions.yml:64:7
     fleet_wtg_vestas_silver.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_vestas_curated
  at jobs.fleet_wtg_vestas_curated
  in cluster_definitions.yml:69:7
     fleet_wtg_vestas_curated.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_ge_silver
  at jobs.fleet_wtg_ge_silver
  in cluster_definitions.yml:44:7
     fleet_wtg_ge_silver.yml:4:7

Error: multiple resources have been defined with the same key: fleet_wtg_ge_curated
  at jobs.fleet_wtg_ge_curated
  in cluster_definitions.yml:49:7
     fleet_wtg_ge_curated.yml:4:7

 Would appreciate some help on this one!

0 REPLIES 0