08-18-2022 09:23 AM
We are adapting the multi-tasks workflow example from dbx documentation for our pipelines https://dbx.readthedocs.io/en/latest/examples/python_multitask_deployment_example.html. As a part of configuration we specify cluster configuration and provide job_cluster_key
. Question: it seams that if consecutive jobs within the workflow use the same cluster, it is not reused between jobs but created anew. Is there a way to configure such that cluster is reused?
08-18-2022 10:22 AM
Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last task using the cluster completes.
Reference: https://docs.databricks.com/workflows/jobs/jobs-api-updates.html
Sample API payload :
{
"job_id": 123456789,
"creator_user_name": "email@domain.com",
"run_as_user_name": "email@domain.com",
"run_as_owner": true,
"settings": {
"name": "MT job",
"email_notifications": {
"no_alert_for_skipped_runs": false
},
"timeout_seconds": 0,
"max_concurrent_runs": 1,
"tasks": [
{
"task_key": "task1",
"notebook_task": {
"notebook_path": "/Users/email@domain.com/test",
"source": "WORKSPACE"
},
"job_cluster_key": "Shared_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
},
{
"task_key": "task2",
"depends_on": [
{
"task_key": "task1"
}
],
"notebook_task": {
"notebook_path": "/Users/email@domain.com/test",
"source": "WORKSPACE"
},
"job_cluster_key": "Shared_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
}
],
"job_clusters": [
{
"job_cluster_key": "Shared_job_cluster",
"new_cluster": {
"cluster_name": "",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"enable_elastic_disk": true,
"runtime_engine": "STANDARD",
"num_workers": 1
}
}
],
"format": "MULTI_TASK"
},
"created_time": 1660842831328
}
08-18-2022 10:22 AM
Tasks within the same multi task job can reuse the clusters. A shared job cluster allows multiple tasks in the same job to use the cluster. The cluster is created and started when the first task using the cluster starts and terminates after the last task using the cluster completes.
Reference: https://docs.databricks.com/workflows/jobs/jobs-api-updates.html
Sample API payload :
{
"job_id": 123456789,
"creator_user_name": "email@domain.com",
"run_as_user_name": "email@domain.com",
"run_as_owner": true,
"settings": {
"name": "MT job",
"email_notifications": {
"no_alert_for_skipped_runs": false
},
"timeout_seconds": 0,
"max_concurrent_runs": 1,
"tasks": [
{
"task_key": "task1",
"notebook_task": {
"notebook_path": "/Users/email@domain.com/test",
"source": "WORKSPACE"
},
"job_cluster_key": "Shared_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
},
{
"task_key": "task2",
"depends_on": [
{
"task_key": "task1"
}
],
"notebook_task": {
"notebook_path": "/Users/email@domain.com/test",
"source": "WORKSPACE"
},
"job_cluster_key": "Shared_job_cluster",
"timeout_seconds": 0,
"email_notifications": {}
}
],
"job_clusters": [
{
"job_cluster_key": "Shared_job_cluster",
"new_cluster": {
"cluster_name": "",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"enable_elastic_disk": true,
"runtime_engine": "STANDARD",
"num_workers": 1
}
}
],
"format": "MULTI_TASK"
},
"created_time": 1660842831328
}
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.