I am trying to create a multi-task Databricks Job in Azure Cloud with its own cluster.
Although I was able to create a single task job without any issues, the code to deploy the multi-task job fails due to the following cluster validation error:
error: 1 error occurred:
* cannot create job: Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size
he code to create the Job is the following:
job = Job(
resource_name = f"{job_name}-job",
args=JobArgs(
name = f"{job_name}-job",
job_clusters=[
JobJobClusterArgs(
job_cluster_key="pulumiTest-basic-cluster",
new_cluster=JobJobClusterNewClusterArgs(
spark_version="13.3.x-scala2.12",
cluster_name="",
num_workers=0,
node_type_id="Standard_DS3_v2",
enable_elastic_disk=True,
runtime_engine="STANDARD",
spark_conf={
# f"fs.azure.account.key.{self.storage_account_name}.dfs.core.windows.net": "{{secrets/pulumiTest-secret-scope/puluTest-storage-access-token}}"
"spark.master": "local[*,4]",
"spark.databricks.cluster.profile": "singleNode"
},
custom_tags={
"ResourceClass": "SingleNode"
},
data_security_mode="LEGACY_SINGLE_USER_STANDARD"
)
)
],
computes=[
JobComputeArgs(
compute_key="landing_task",
spec=JobComputeSpecArgs(kind="spark_python_task")
),
JobComputeArgs(
compute_key="staging_task",
spec=JobComputeSpecArgs(kind="spark_python_task")
),
JobComputeArgs(
compute_key="refined_task",
spec=JobComputeSpecArgs(kind="spark_python_task")
)
],
tasks = [
JobTaskArgs(
task_key="landing_task",
job_cluster_key="pulumiTest-basic-cluster",
spark_python_task=JobSparkPythonTaskArgs(
python_file="/pipelineExample/landing.py",
source="GIT"
),
run_if="ALL_SUCCESS",
libraries=[
JobLibraryArgs(
whl=whl_path
)
]
),
JobTaskArgs(
task_key="staging_task",
job_cluster_key="pulumiTest-basic-cluster",
spark_python_task=JobSparkPythonTaskArgs(
python_file="/pipelineExample/staging.py",
source="GIT"
),
depends_ons=[
JobTaskDependsOnArgs(
task_key="landing_task"
)
],
run_if="ALL_SUCCESS",
libraries=[
JobLibraryArgs(
whl=whl_path
)
]
),
JobTaskArgs(
task_key="refined_task",
job_cluster_key="pulumiTest-basic-cluster",
spark_python_task=JobSparkPythonTaskArgs(
python_file="/pipelineExample/refined.py",
source="GIT"
),
depends_ons=[
JobTaskDependsOnArgs(
task_key="staging_task"
)
],
run_if="ALL_SUCCESS",
libraries=[
JobLibraryArgs(
whl=whl_path
)
]
)
],
git_source=JobGitSourceArgs(
url=git_url,
provider="gitHub",
branch="main"
),
format="MULTI_TASK"
)
)
)
pulumi.export('Job URL', job.url)
Does anyone know where the problem could be?