topic Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs? in Data Engineering

Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

satyam-verma — Sun, 06 Apr 2025 18:51:59 GMT

I’m transitioning from all-purpose clusters to job compute to optimize costs. Previously, we reused an existing_cluster_id in the job configuration to reduce total job runtime.

My use case:

A parent job triggers multiple child jobs sequentially.
I want to create a job compute cluster in the parent job and reuse the same cluster for all child jobs.

Has anyone implemented this? Any advice on achieving this setup would be greatly appreciated!

Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

Abiola1 — Sun, 06 Apr 2025 21:40:50 GMT

Hi, here are possible workaround options

Option 1: Parent Job Creates Cluster, Passes Cluster ID to Child Jobs (Workaround)
This is a clever workaround but requires manual management:

Create a job compute cluster manually via API or CLI (using Clusters API).

Extract the cluster_id after creation.

Pass the existing_cluster_id as a parameter to all child jobs using Jobs API or orchestration tools like Databricks Workflows or Airflow.

Ensure child jobs use existing_cluster_id.

Parent job terminates the cluster at the end.

You must manage lifecycle (start and terminate), and there's risk of cost overruns if termination fails.

Option 2: Convert Child Tasks into Sub-Tasks (Job Tasks API)
Instead of separate jobs, use one multi-task job where each child is a task.

All tasks can share a single job cluster defined at the job level.

This is the recommended and most cost-efficient way.

{
"job_clusters": [
{
"job_cluster_key": "shared_job_cluster",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
}
}
],
"tasks": [
{
"task_key": "task1",
"notebook_task": { "notebook_path": "/Tasks/Task1" },
"job_cluster_key": "shared_job_cluster"
},
{
"task_key": "task2",
"depends_on": [ { "task_key": "task1" } ],
"notebook_task": { "notebook_path": "/Tasks/Task2" },
"job_cluster_key": "shared_job_cluster"
}
]
}

All tasks here use the same shared_job_cluster, and it's automatically terminated once the job finishes.

Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

Brahmareddy — Mon, 07 Apr 2025 02:37:06 GMT

Hi satyam-verma,

How are you doing today?, As per my understanding, switching from all-purpose clusters to job compute can definitely help with cost optimization. In your case, where a parent job triggers multiple child jobs, it makes sense to want to reuse the same job cluster to avoid the overhead of spinning up a new one each time. However, Databricks job clusters are ephemeral—they're created at the start of a job and shut down when the job finishes—so they can’t be reused across multiple jobs like all-purpose clusters. A common workaround is to refactor your child jobs into tasks within a single multi-task job using Databricks Workflows. This way, all the tasks can share the same job cluster defined in the parent job, and you'll still get the cost-saving benefits of job compute. If you absolutely need separate jobs, the only way to share compute is by going back to an all-purpose cluster, but that might defeat the cost benefits. Let me know if you want help setting up a multi-task workflow!

Regards,

Brahma