Ability to submit a single job with cluster pools and init scripts
for the following payload:
{
"run_name": "A multitask job run",
"timeout_seconds": 86400,
"tasks": [
{
"task_key": "task_1",
"depends_on": [],
"notebook_task": {
"notebook_path": "/Workspace/Users/johndoe/task_1",
"source": "WORKSPACE"
},
"new_cluster": {
"spark_version": "15.3.x-scala2.12",
"instance_pool_id": "0926-080838-lute60-pool-91skna4w",
"driver_instance_pool_id": "0926-080838-lute60-pool-91skna4w",
"num_workers": 1,
"init_scripts": [
{
"s3": {
"destination": "s3://bucket_name/init_scripts/install_utils.sh"
}
}
]
}
}
]
}
this endpoint
/api/2.1/jobs/runs/submit
runs the job without passing the init scripts.
while if I send the same payload to
/api/2.1/jobs/create
it creates a job that uses both cluster pools and the init scripts.
I'm using the airflow operators such as DatabricksSubmitRunOperator (or DatabricksNotebookOperator)
which both invoke the submit endpoint, so If I want to use cluster pools the init scripts suddenly don't apply
Please let me know why is this behavior happening, is it on purpose? a known limitation?
thank you