Databricks Community

Serhii · ‎08-18-2022

We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide job_cluster_key

. Question: it seams that if consecutive jobs within the workflow use the same cluster, it is not reused between jobs but created anew. Is there a way to configure such that cluster is reused?

User16873043099 · ‎08-18-2022

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last task using the cluster completes.

Reference: https://docs.databricks.com/workflows/jobs/jobs-api-updates.html

Sample API payload :

{
    "job_id": 123456789,
    "creator_user_name": "email@domain.com",
    "run_as_user_name": "email@domain.com",
    "run_as_owner": true,
    "settings": {
        "name": "MT job",
        "email_notifications": {
            "no_alert_for_skipped_runs": false
        },
        "timeout_seconds": 0,
        "max_concurrent_runs": 1,
        "tasks": [
            {
                "task_key": "task1",
                "notebook_task": {
                    "notebook_path": "/Users/email@domain.com/test",
                    "source": "WORKSPACE"
                },
                "job_cluster_key": "Shared_job_cluster",
                "timeout_seconds": 0,
                "email_notifications": {}
            },
            {
                "task_key": "task2",
                "depends_on": [
                    {
                        "task_key": "task1"
                    }
                ],
                "notebook_task": {
                    "notebook_path": "/Users/email@domain.com/test",
                    "source": "WORKSPACE"
                },
                "job_cluster_key": "Shared_job_cluster",
                "timeout_seconds": 0,
                "email_notifications": {}
            }
        ],
        "job_clusters": [
            {
                "job_cluster_key": "Shared_job_cluster",
                "new_cluster": {
                    "cluster_name": "",
                    "spark_version": "10.4.x-scala2.12",
                    "spark_conf": {
                        "spark.databricks.delta.preview.enabled": "true"
                    },
                    "azure_attributes": {
                        "first_on_demand": 1,
                        "availability": "ON_DEMAND_AZURE",
                        "spot_bid_max_price": -1
                    },
                    "node_type_id": "Standard_DS3_v2",
                    "spark_env_vars": {
                        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
                    },
                    "enable_elastic_disk": true,
                    "runtime_engine": "STANDARD",
                    "num_workers": 1
                }
            }
        ],
        "format": "MULTI_TASK"
    },
    "created_time": 1660842831328
}

View solution in original post

User16873043099 · ‎08-18-2022

Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last task using the cluster completes.

Reference: https://docs.databricks.com/workflows/jobs/jobs-api-updates.html

Sample API payload :

{
    "job_id": 123456789,
    "creator_user_name": "email@domain.com",
    "run_as_user_name": "email@domain.com",
    "run_as_owner": true,
    "settings": {
        "name": "MT job",
        "email_notifications": {
            "no_alert_for_skipped_runs": false
        },
        "timeout_seconds": 0,
        "max_concurrent_runs": 1,
        "tasks": [
            {
                "task_key": "task1",
                "notebook_task": {
                    "notebook_path": "/Users/email@domain.com/test",
                    "source": "WORKSPACE"
                },
                "job_cluster_key": "Shared_job_cluster",
                "timeout_seconds": 0,
                "email_notifications": {}
            },
            {
                "task_key": "task2",
                "depends_on": [
                    {
                        "task_key": "task1"
                    }
                ],
                "notebook_task": {
                    "notebook_path": "/Users/email@domain.com/test",
                    "source": "WORKSPACE"
                },
                "job_cluster_key": "Shared_job_cluster",
                "timeout_seconds": 0,
                "email_notifications": {}
            }
        ],
        "job_clusters": [
            {
                "job_cluster_key": "Shared_job_cluster",
                "new_cluster": {
                    "cluster_name": "",
                    "spark_version": "10.4.x-scala2.12",
                    "spark_conf": {
                        "spark.databricks.delta.preview.enabled": "true"
                    },
                    "azure_attributes": {
                        "first_on_demand": 1,
                        "availability": "ON_DEMAND_AZURE",
                        "spot_bid_max_price": -1
                    },
                    "node_type_id": "Standard_DS3_v2",
                    "spark_env_vars": {
                        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
                    },
                    "enable_elastic_disk": true,
                    "runtime_engine": "STANDARD",
                    "num_workers": 1
                }
            }
        ],
        "format": "MULTI_TASK"
    },
    "created_time": 1660842831328
}

Databricks Community

Behaviour of cluster launches in multi-task jobs

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions