cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Multi Task Job creation through Pulumi

Borkadd
New Contributor II

I am trying to create a multi-task Databricks Job in Azure Cloud with its own cluster.

Although I was able to create a single task job without any issues, the code to deploy the multi-task job fails due to the following cluster validation error:

error: 1 error occurred:
        * cannot create job: Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size

he code to create the Job is the following:

job = Job(
            resource_name = f"{job_name}-job",
            args=JobArgs(
                name = f"{job_name}-job",
                job_clusters=[
                    JobJobClusterArgs(
                        job_cluster_key="pulumiTest-basic-cluster",
                        new_cluster=JobJobClusterNewClusterArgs(
                            spark_version="13.3.x-scala2.12",
                            cluster_name="",
                            num_workers=0,
                            node_type_id="Standard_DS3_v2",
                            enable_elastic_disk=True,
                            runtime_engine="STANDARD",
                            spark_conf={
                                # f"fs.azure.account.key.{self.storage_account_name}.dfs.core.windows.net": "{{secrets/pulumiTest-secret-scope/puluTest-storage-access-token}}"
                                "spark.master": "local[*,4]",
                                "spark.databricks.cluster.profile": "singleNode"
                            },
                            custom_tags={
                                "ResourceClass": "SingleNode"
                            },
                            data_security_mode="LEGACY_SINGLE_USER_STANDARD"
                        )
                    )
                ],
                computes=[
                    JobComputeArgs(
                        compute_key="landing_task",
                        spec=JobComputeSpecArgs(kind="spark_python_task")
                    ),
                    JobComputeArgs(
                        compute_key="staging_task",
                        spec=JobComputeSpecArgs(kind="spark_python_task")
                    ),
                    JobComputeArgs(
                        compute_key="refined_task",
                        spec=JobComputeSpecArgs(kind="spark_python_task")
                    )
                ],
                tasks = [
                    JobTaskArgs(
                        task_key="landing_task",
                        job_cluster_key="pulumiTest-basic-cluster",
                        spark_python_task=JobSparkPythonTaskArgs(
                            python_file="/pipelineExample/landing.py",
                            source="GIT"
                        ),
                        run_if="ALL_SUCCESS",
                        libraries=[
                            JobLibraryArgs(
                                whl=whl_path
                            )
                        ]
                    ),
                    JobTaskArgs(
                        task_key="staging_task",
                        job_cluster_key="pulumiTest-basic-cluster",
                        spark_python_task=JobSparkPythonTaskArgs(
                            python_file="/pipelineExample/staging.py",
                            source="GIT"
                        ),
                        depends_ons=[
                            JobTaskDependsOnArgs(
                                task_key="landing_task"
                            )
                        ],
                        run_if="ALL_SUCCESS",
                        libraries=[
                            JobLibraryArgs(
                                whl=whl_path
                            )
                        ]
                    ),
                    JobTaskArgs(
                        task_key="refined_task",
                        job_cluster_key="pulumiTest-basic-cluster",
                        spark_python_task=JobSparkPythonTaskArgs(
                            python_file="/pipelineExample/refined.py",
                            source="GIT"
                        ),
                        depends_ons=[
                            JobTaskDependsOnArgs(
                                task_key="staging_task"
                            )
                        ],
                        run_if="ALL_SUCCESS",
                        libraries=[
                            JobLibraryArgs(
                                whl=whl_path
                            )
                        ]
                    )
                ],
                git_source=JobGitSourceArgs(
                    url=git_url,
                    provider="gitHub",
                    branch="main"
                ),
                format="MULTI_TASK"
            )
        )
)
pulumi.export('Job URL', job.url)

 Does anyone know where the problem could be?

1 REPLY 1

Borkadd
New Contributor II

Hello @Retired_mod, thanks for your answer, but the problem keeps the same. I had already tested with different cluster configurations, single-node and multi-node, including those cluster configurations which worked with single task jobs, but the error does not change, it is always about the new cluster size.

According to  documentation here: https://www.pulumi.com/registry/packages/databricks/api-docs/job/#jobnewcluster I understand that I need to set the cluster specifications in the parameter job_clusters, not in new_cluster as with single task jobs.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group