cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks asset bundles use cluster depending on target (environment) is possible?

ashdam
New Contributor III

Here is my bundle definition

 

 
Spoiler
# This is a Databricks asset bundle definition for my_project.

experimental:
  python_wheel_wrapper: true

bundle:
  name: my_project

include:
  - resources/*.yml

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    compute_id: xxxxx-yyyyyyyy-zzzzzzz
    workspace:

  # Optionally, there could be a 'staging' target here.
  # (See Databricks docs on CI/CD at https://docs.databricks.com/dev-tools/bundles/index.html.)
  #
  # staging:
  #  workspace:

  # The 'prod' target, used for production deployment.
  prod:
    # For production deployments, we only have a single copy, so we override the
    # workspace.root_path default of
    # /Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
    # to a path that is not specific to the current user.
    mode: production
    workspace:
      root_path: /Shared/.bundle/prod/${bundle.name}
    run_as:
      # This runs as gonzalomoran@ppg.com in production. Alternatively,
      # a service principal could be used here using service_principal_name
      # (see Databricks documentation).
      user_name: gonzalomoran@ppg.com
 
   My user has no rights to create new cluster but job definition tries to create a new one
 
 
 
Spoiler
# The main job for my_project
resources:
  jobs:
    my_project_job:
      name: my_project_job

      schedule:
        quartz_cron_expression: '44 37 8 * * ?'
        timezone_id: Europe/Amsterdam

      email_notifications:
        on_failure:
          - gonzalomoran@ppg.com

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ../src/notebook.ipynb
       
        - task_key: refresh_pipeline
          depends_on:
            - task_key: notebook_task
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_project_pipeline.id}
       
        - task_key: main_task
          depends_on:
            - task_key: refresh_pipeline
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # for more information on how to add other libraries.
            - whl: ../dist/*.whl

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_D3_v2
            autoscale:
                min_workers: 1
                max_workers: 4
 I tried to remove the "job_clusters" lines but its complains about its missing. The other option is using "exiting_cumputing" within the job but this would conflic when I wanted to run the same job in production with another cluster.
 
Do you know how to use the defined cluster per defined target?
 
Regards
 
 
 
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @ashdam , 

Certainly! It seems you’re dealing with a scenario where you want to use a specific cluster based on a defined target.

Let’s explore some options:

  1. Conditional Cluster Selection:

    • You can conditionally select a cluster based on the target environment. For example, if you’re running a job in a development environment, use a specific cluster, and if it’s in production, use another cluster.
    • This approach allows you to define different clusters for different contexts (e.g., development, staging, production) and switch between them as needed.
  2. Dynamic Cluster Assignment:

    • Rather than hardcoding the cluster name in your job definition, consider dynamically assigning the cluster based on the target.
    • You could use environment variables, configuration files, or a central service that maps targets to clusters.
    • For instance, your job definition could read an environment variable like TARGET_ENVIRONMENT and then select the appropriate cluster based on its value.
  3. Cluster Mapping Table:

    • Create a mapping table that associates each target environment with a specific cluster.
    • In your job definition, look up the target environment and select the corresponding cluster from the table.
    • This approach provides flexibility and avoids hardcoding cluster names directly in your job definition.

Remember to document your approach clearly so that other team members can understand and maintain it. Choose the method that aligns best with your project’s requirements and organizational practices.

If you need further assistance, feel free to ask! 😊

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ashdam , 

Certainly! It seems you’re dealing with a scenario where you want to use a specific cluster based on a defined target.

Let’s explore some options:

  1. Conditional Cluster Selection:

    • You can conditionally select a cluster based on the target environment. For example, if you’re running a job in a development environment, use a specific cluster, and if it’s in production, use another cluster.
    • This approach allows you to define different clusters for different contexts (e.g., development, staging, production) and switch between them as needed.
  2. Dynamic Cluster Assignment:

    • Rather than hardcoding the cluster name in your job definition, consider dynamically assigning the cluster based on the target.
    • You could use environment variables, configuration files, or a central service that maps targets to clusters.
    • For instance, your job definition could read an environment variable like TARGET_ENVIRONMENT and then select the appropriate cluster based on its value.
  3. Cluster Mapping Table:

    • Create a mapping table that associates each target environment with a specific cluster.
    • In your job definition, look up the target environment and select the corresponding cluster from the table.
    • This approach provides flexibility and avoids hardcoding cluster names directly in your job definition.

Remember to document your approach clearly so that other team members can understand and maintain it. Choose the method that aligns best with your project’s requirements and organizational practices.

If you need further assistance, feel free to ask! 😊

SvenG
New Contributor II

Hi @Kaniz,

Is it possible that you provide a minimal working example for option 1 or option 3?
I currently have a test jop:

"""

resources:
  jobs:
    my_project_job: #my_project_job_${bundle.target}
      name: Asset-bundle-test-job-${bundle.target}
      schedule:
        quartz_cron_expression: '44 37 8 * * ?'
        timezone_id: Europe/Amsterdam
      tasks:
        - task_key: notebook_task
          existing_cluster_id: ${var.my_existing_cluster}  
          notebook_task:
            notebook_path: ../src/notebook_${bundle.target}_test.ipynb
"""
with
 
"""
variables:
  my_existing_cluster:
    desciption: Id of my existing Cluster
    default: 12345_my_id
"""
and i want to use a different cluster in prod and dev, however, the job that should be executed should remain the same.
Any ideas how i can solve this issue?
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.