cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating a spot only single-node job compute cluster policy

Kash
Contributor III

Hi there,

I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.

The jobs I need to ETL are very short and complete within a few minutes and I don't think it's wise to spend 2 DBU's on something when 1DBU would suffice.

Thank you in advance for your help!

K

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Below is the required policy. Spot instances you need to define inside the pool, that's why I included reference to pool below.

{
   "cluster_type":{
      "type":"fixed",
      "value":"job"
   },
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
   },
   "instance_pool_id":{
      "type":"fixed",
      "value":"singleNodePoolId1",
      "hidden":true
   },
   "num_workers":{
      "type":"range",
      "maxValue":0
   }
}

Hi there,

Thank you for the quick reply. I'm looking to create a policy not for a pool but for any job in the workflow.

Here is the current policy I am playing with. Please let me know if you see where this is off.

{
   "spark_conf.spark.databricks.cluster.profile":{
      "type":"fixed",
      "value":"singleNode",
      "hidden":true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "unlimited",
    "defaultValue": "i3.xlarge",
    "isOptional": true
  },
  "num_workers" : {
    "type" : "fixed",
    "value" : 0,
    "hidden" : true
  },
  "aws_attributes.availability": {
    "type": "fixed",
    "value": "SPOT",
    "hidden": true
  },
  "aws_attributes.zone_id": {
    "type": "unlimited",
    "defaultValue": "auto",
    "hidden": true
  },
  "aws_attributes.spot_bid_price_percent": {
    "type": "fixed",
    "value": 100,
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "driver_instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  }
}

P.S. When I copy your code into the policy maker it says singleNodePoolId1 does not exist.

Hubert-Dudek
Esteemed Contributor III

This is the policy for the job, but if you want to use spot instances first, you need to create a pool with spot instance. singleNodePoolId1 is just an example name. Just create a pool spot with 1 machine, name it how you want, and put your name in JSON.

jose_gonzalez
Moderator
Moderator

Hi @Avkash Kana​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.