cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating a spot only single-node job compute cluster policy

Kash
Contributor III

Hi there,

I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.

The jobs I need to ETL are very short and complete within a few minutes and I don't think it's wise to spend 2 DBU's on something when 1DBU would suffice.

Thank you in advance for your help!

K

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Below is the required policy. Spot instances you need to define inside the pool, that's why I included reference to pool below.

{
   "cluster_type":{
      "type":"fixed",
      "value":"job"
   },
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
   },
   "instance_pool_id":{
      "type":"fixed",
      "value":"singleNodePoolId1",
      "hidden":true
   },
   "num_workers":{
      "type":"range",
      "maxValue":0
   }
}

Hi there,

Thank you for the quick reply. I'm looking to create a policy not for a pool but for any job in the workflow.

Here is the current policy I am playing with. Please let me know if you see where this is off.

{
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "i3.xlarge",
    "isOptional": true
  },
  "num_workers" : {
    "type" : "fixed",
    "value" : 0,
    "hidden" : true
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT",
    "hidden": true
  },
  "aws_attributes.zone_id": {
    "type": "unlimited",
    "defaultValue": "auto",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  }
}

P.S. When I copy your code into the policy maker it says singleNodePoolId1 does not exist.

Hubert-Dudek
Esteemed Contributor III

This is the policy for the job, but if you want to use spot instances first, you need to create a pool with spot instance. singleNodePoolId1 is just an example name. Just create a pool spot with 1 machine, name it how you want, and put your name in JSON.

jose_gonzalez
Moderator
Moderator

Hi @Avkash Kana​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.