Hi! I have several tiny jobs that run in parallel and I want them to run on the same cluster:
- Tasks type Python Script: I send the parameters this way to run the pyspark scripts.
- Job compute cluster created as (copied JSON from Databricks Job UI)
How can I achieve that if I send 5 jobs they all run in the same instance of the cluster instead of instancing 5 clusters?
{
"num_workers": 2,
"cluster_name": "",
"spark_version": "12.2.x-scala2.12",
"spark_conf": {},
"aws_attributes": {
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "us-east-1d",
"instance_profile_arn": "xxxx",
"spot_bid_price_percent": 80,
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 1,
"ebs_volume_size": 100
},
"node_type_id": "c5.2xlarge",
"driver_node_type_id": "m5a.large",
"ssh_public_keys": [],
"spark_env_vars": {},
"enable_elastic_disk": false,
"cluster_source": "JOB",
"init_scripts": [],
"data_security_mode": "NONE"
}
Thanks!