Databricks Community

Kash · ‎01-24-2023

Hi there,

I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.

The jobs I need to ETL are very short and complete within a few minutes and I don't think it's wise to spend 2 DBU's on something when 1DBU would suffice.

Thank you in advance for your help!

K

Hubert-Dudek · ‎01-24-2023

Below is the required policy. Spot instances you need to define inside the pool, that's why I included reference to pool below.

{
   "cluster_type":{
      "type":"fixed",
      "value":"job"
   },
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
   },
   "instance_pool_id":{
      "type":"fixed",
      "value":"singleNodePoolId1",
      "hidden":true
   },
   "num_workers":{
      "type":"range",
      "maxValue":0
   }
}

Kash · ‎01-24-2023

Hi there,

Thank you for the quick reply. I'm looking to create a policy not for a pool but for any job in the workflow.

Here is the current policy I am playing with. Please let me know if you see where this is off.

{
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "i3.xlarge",
    "isOptional": true
  },
  "num_workers" : {
    "type" : "fixed",
    "value" : 0,
    "hidden" : true
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT",
    "hidden": true
  },
  "aws_attributes.zone_id": {
    "type": "unlimited",
    "defaultValue": "auto",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  }
}

P.S. When I copy your code into the policy maker it says singleNodePoolId1 does not exist.

Hubert-Dudek · ‎01-25-2023

This is the policy for the job, but if you want to use spot instances first, you need to create a pool with spot instance. singleNodePoolId1 is just an example name. Just create a pool spot with 1 machine, name it how you want, and put your name in JSON.

jose_gonzalez · ‎02-24-2023

Hi @Avkash Kana,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Databricks Community

Creating a spot only single-node job compute cluster policy

Join Us as a Local Community Builder!

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog

🚀 New: Databricks Interactive Architecture Design Workshops