Databricks cluster pools with init scripts

radix
New Contributor II

Ability to submit a single job with cluster pools and init scripts

for the following payload:

 

 

{
    "run_name": "A multitask job run",
    "timeout_seconds": 86400,
    "tasks": [
        {
            "task_key": "task_1",
            "depends_on": [],
            "notebook_task": {
                "notebook_path": "/Workspace/Users/johndoe/task_1",
                "source": "WORKSPACE"
            },
            "new_cluster": {
                "spark_version": "15.3.x-scala2.12",
                "instance_pool_id": "0926-080838-lute60-pool-91skna4w",
                "driver_instance_pool_id": "0926-080838-lute60-pool-91skna4w",
                "num_workers": 1,
                "init_scripts": [
                    {
                        "s3": {
                            "destination": "s3://bucket_name/init_scripts/install_utils.sh"
                        }
                    }
                ]
            }
        }
    ]
}

 

 

this endpoint

 

 

/api/2.1/jobs/runs/submit

 

 

 runs the job without passing the init scripts.

while if I send the same payload to 

 

 

/api/2.1/jobs/create

 

 

 it creates a job that uses both cluster pools and the init scripts.

I'm using the airflow operators such as  DatabricksSubmitRunOperator (or DatabricksNotebookOperator)
which both invoke the submit endpoint, so If I want to use cluster pools the init scripts suddenly don't apply

Please let me know why is this behavior happening, is it on purpose? a known limitation?

thank you

Walter_C
Databricks Employee
Databricks Employee

Are you still facing issues with the job run submit API endpoint?