Hi, here are possible workaround options
Option 1: Parent Job Creates Cluster, Passes Cluster ID to Child Jobs (Workaround)
This is a clever workaround but requires manual management:
Create a job compute cluster manually via API or CLI (using Clusters API).
Extract the cluster_id after creation.
Pass the existing_cluster_id as a parameter to all child jobs using Jobs API or orchestration tools like Databricks Workflows or Airflow.
Ensure child jobs use existing_cluster_id.
Parent job terminates the cluster at the end.
You must manage lifecycle (start and terminate), and there's risk of cost overruns if termination fails.
Option 2: Convert Child Tasks into Sub-Tasks (Job Tasks API)
Instead of separate jobs, use one multi-task job where each child is a task.
All tasks can share a single job cluster defined at the job level.
This is the recommended and most cost-efficient way.
{
"job_clusters": [
{
"job_cluster_key": "shared_job_cluster",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
}
}
],
"tasks": [
{
"task_key": "task1",
"notebook_task": { "notebook_path": "/Tasks/Task1" },
"job_cluster_key": "shared_job_cluster"
},
{
"task_key": "task2",
"depends_on": [ { "task_key": "task1" } ],
"notebook_task": { "notebook_path": "/Tasks/Task2" },
"job_cluster_key": "shared_job_cluster"
}
]
}
All tasks here use the same shared_job_cluster, and it's automatically terminated once the job finishes.
Abiola