Databricks Community

alejandrofm · ‎07-31-2023

Hi! I have several tiny jobs that run in parallel and I want them to run on the same cluster:

- Tasks type Python Script: I send the parameters this way to run the pyspark scripts.

- Job compute cluster created as (copied JSON from Databricks Job UI)

How can I achieve that if I send 5 jobs they all run in the same instance of the cluster instead of instancing 5 clusters?

{
    "num_workers": 2,
    "cluster_name": "",
    "spark_version": "12.2.x-scala2.12",
    "spark_conf": {},
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "us-east-1d",
        "instance_profile_arn": "xxxx",
        "spot_bid_price_percent": 80,
        "ebs_volume_type": "GENERAL_PURPOSE_SSD",
        "ebs_volume_count": 1,
        "ebs_volume_size": 100
    },
    "node_type_id": "c5.2xlarge",
    "driver_node_type_id": "m5a.large",
    "ssh_public_keys": [],
    "spark_env_vars": {},
    "enable_elastic_disk": false,
    "cluster_source": "JOB",
    "init_scripts": [],
    "data_security_mode": "NONE"
}

Thanks!

KoenZandvliet · ‎08-04-2023

Unfortunately, running multiple jobs in parallel using a single job cluster is not supported (yet). New in databricks is the possibility to create a job that orchestrates multiple jobs. These jobs will however still use their own cluster (configuration).

In the case the time of instancing a cluster takes much longer than executing an actual job, you could consider using `all-purpose` cluster. Depending on your use-case it might also be possible to rewrite/re-configure your 5 jobs to a single job with 5 tasks.

View solution in original post

KoenZandvliet · ‎08-04-2023

Unfortunately, running multiple jobs in parallel using a single job cluster is not supported (yet). New in databricks is the possibility to create a job that orchestrates multiple jobs. These jobs will however still use their own cluster (configuration).

In the case the time of instancing a cluster takes much longer than executing an actual job, you could consider using `all-purpose` cluster. Depending on your use-case it might also be possible to rewrite/re-configure your 5 jobs to a single job with 5 tasks.