cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks asset bundles use cluster depending on target (environment) is possible?

ashdam
New Contributor III

Here is my bundle definition

 

 
Spoiler
# This is a Databricks asset bundle definition for my_project.

experimental:
  python_wheel_wrapper: true

bundle:
  name: my_project

include:
  - resources/*.yml

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    compute_id: xxxxx-yyyyyyyy-zzzzzzz
    workspace:

  # Optionally, there could be a 'staging' target here.
  # (See Databricks docs on CI/CD at https://docs.databricks.com/dev-tools/bundles/index.html.)
  #
  # staging:
  #  workspace:

  # The 'prod' target, used for production deployment.
  prod:
    # For production deployments, we only have a single copy, so we override the
    # workspace.root_path default of
    # /Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
    # to a path that is not specific to the current user.
    mode: production
    workspace:
      root_path: /Shared/.bundle/prod/${bundle.name}
    run_as:
      # This runs as gonzalomoran@ppg.com in production. Alternatively,
      # a service principal could be used here using service_principal_name
      # (see Databricks documentation).
      user_name: gonzalomoran@ppg.com
 
   My user has no rights to create new cluster but job definition tries to create a new one
 
 
 
Spoiler
# The main job for my_project
resources:
  jobs:
    my_project_job:
      name: my_project_job

      schedule:
        quartz_cron_expression: '44 37 8 * * ?'
        timezone_id: Europe/Amsterdam

      email_notifications:
        on_failure:
          - gonzalomoran@ppg.com

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ../src/notebook.ipynb
       
        - task_key: refresh_pipeline
          depends_on:
            - task_key: notebook_task
          pipeline_task:
            pipeline_id: ${resources.pipelines.my_project_pipeline.id}
       
        - task_key: main_task
          depends_on:
            - task_key: refresh_pipeline
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # for more information on how to add other libraries.
            - whl: ../dist/*.whl

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_D3_v2
            autoscale:
                min_workers: 1
                max_workers: 4
 I tried to remove the "job_clusters" lines but its complains about its missing. The other option is using "exiting_cumputing" within the job but this would conflic when I wanted to run the same job in production with another cluster.
 
Do you know how to use the defined cluster per defined target?
 
Regards
 
 
 
1 REPLY 1

SvenG
New Contributor II

Hi @Retired_mod,

Is it possible that you provide a minimal working example for option 1 or option 3?
I currently have a test jop:

"""

resources:
  jobs:
    my_project_job: #my_project_job_${bundle.target}
      name: Asset-bundle-test-job-${bundle.target}
      schedule:
        quartz_cron_expression: '44 37 8 * * ?'
        timezone_id: Europe/Amsterdam
      tasks:
        - task_key: notebook_task
          existing_cluster_id: ${var.my_existing_cluster}  
          notebook_task:
            notebook_path: ../src/notebook_${bundle.target}_test.ipynb
"""
with
 
"""
variables:
  my_existing_cluster:
    desciption: Id of my existing Cluster
    default: 12345_my_id
"""
and i want to use a different cluster in prod and dev, however, the job that should be executed should remain the same.
Any ideas how i can solve this issue?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group