cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating a spot only single-node job compute cluster policy

Kash
Contributor III

Hi there,

I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.

The jobs I need to ETL are very short and complete within a few minutes and I don't think it's wise to spend 2 DBU's on something when 1DBU would suffice.

Thank you in advance for your help!

K

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Below is the required policy. Spot instances you need to define inside the pool, that's why I included reference to pool below.

{
   "cluster_type":{
      "type":"fixed",
      "value":"job"
   },
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
   },
   "instance_pool_id":{
      "type":"fixed",
      "value":"singleNodePoolId1",
      "hidden":true
   },
   "num_workers":{
      "type":"range",
      "maxValue":0
   }
}

Hi there,

Thank you for the quick reply. I'm looking to create a policy not for a pool but for any job in the workflow.

Here is the current policy I am playing with. Please let me know if you see where this is off.

{
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "i3.xlarge",
    "isOptional": true
  },
  "num_workers" : {
    "type" : "fixed",
    "value" : 0,
    "hidden" : true
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT",
    "hidden": true
  },
  "aws_attributes.zone_id": {
    "type": "unlimited",
    "defaultValue": "auto",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  }
}

P.S. When I copy your code into the policy maker it says singleNodePoolId1 does not exist.

Hubert-Dudek
Esteemed Contributor III

This is the policy for the job, but if you want to use spot instances first, you need to create a pool with spot instance. singleNodePoolId1 is just an example name. Just create a pool spot with 1 machine, name it how you want, and put your name in JSON.

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Avkash Kana​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group