cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Specifying a serverless cluster for the dev environment in databricks.yml

lukasz_wybieral
New Contributor II

 

Hey, I'm trying to find a way to specify a serverless cluster for the dev environment and job clusters for the test and prod environments in databricks.yml.

The problem is that it seems impossible - Iโ€™ve tried many approaches, but the only outcomes I can achieve are either:

A. I donโ€™t specify any cluster, and it runs on Serverless for all three environments.
B. I specify a cluster, but then I canโ€™t explicitly set it to be serverless for dev (it must be any of predefined job clusters).

Iโ€™ve searched everywhere but havenโ€™t found any information on how to do this.

Has anyone found a way to make this work?

Hereโ€™s the code I wish would work (or something similar), so you know what I mean:

#databricks.yml file

bundle:
  name: medallion_architecture

include:
  - resources/*.yml

variables:
  catalog:
    type: string

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: adb-xxxxxxxxxxxxxx.azuredatabricks.net
    variables:
      catalog: "dev"
      cluster_config: serverless # (this doesn't work)

  test:
    mode: production
    workspace:
      host: adb-xxxxxxxxxxxxxx.azuredatabricks.net
      root_path: /Shared/.bundle/${bundle.target}/${bundle.name}
    variables:
      catalog: "test"
      cluster_config:
        spark_version: "15.4.x-scala2.12"
        node_type_id: "Standard_D8s_v3"
        autoscale:
          min_workers: 2
          max_workers: 6
        azure_attributes:
          availability: "ON_DEMAND_AZURE"


  prod:
    mode: production
    workspace:
      host: adb-xxxxxxxxxxxxxx.azuredatabricks.net
      root_path: /Shared/.bundle/${bundle.target}/${bundle.name}
    run_as:
      user_name: ${workspace.current_user.userName}
    variables:
      catalog: "prod"
      cluster_config:
        spark_version: "15.4.x-scala2.12"
        node_type_id: "Standard_D8s_v3"
        autoscale:
          min_workers: 2
          max_workers: 10
        azure_attributes:
          availability: "ON_DEMAND_AZURE"

 

2 REPLIES 2

Nivethan_Venkat
Contributor III

Hi @lukasz_wybieral

It is not necessary to specify the cluster_config, if you would like to use serverless. Be default, Databricks picks the Serverless cluster if you don't specify the cluster configuration. Attaching below databricks.yml for your reference:

Nivethan_Venkat_0-1754990653880.png

This is my how my job looks like if the cluster configuration is not specified:

Nivethan_Venkat_1-1754990723131.png

But it is good to refer here, for which jobs Serverless computes can be configured: https://docs.databricks.com/aws/en/jobs/compute


Thanks & Regards,
Nivethan V



Hey Nivethan, thanks, Iโ€™m aware of that, but it doesnโ€™t solve my problem.

I want to be able to define different clusters for each environment, specifically:

  1. Serverless for dev

  2. Medium job cluster for test

  3. Large job cluster for prod

Currently, itโ€™s either:

  • Define job clusters (expect for Serverless) for all environments, or

  • Donโ€™t define any at all (and it defaults to Serverless to all targets).

Itโ€™s not a huge issue, but itโ€™s annoying.