Hello community,
we are deploying a job using asset bundles and the job should run on a single node job cluster. Here is the DAB job definition:
resources:
jobs:
example_job:
name: example_job
tasks:
- task_key: main_task
job_cluster_key: ${var.job_cluster_prefix}
python_wheel_task:
package_name: example_package
entry_point: entrypoint
named_parameters:
config-path: "/Workspace${workspace.file_path}/config/app.conf"
environment: "${bundle.target}"
libraries:
- whl: ./dist/*.whl
job_clusters:
- job_cluster_key: ${var.job_cluster_prefix}
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: m4.2xlarge
num_workers: 0
aws_attributes:
first_on_demand: 0
availability: SPOT_WITH_FALLBACK
zone_id: auto
spot_bid_price_percent: 100
ebs_volume_type: GENERAL_PURPOSE_SSD
ebs_volume_count: 1
ebs_volume_size: 100
tags:
costs/environment: ${bundle.target}
costs/stage: ${var.costs_stage}
service: ${var.service}
domain: ${var.domain}
owner: ${var.domain}
Since yesterday we are facing the problem that the cluster spins up but does not run the code. Instead the following warning is printed to the Log4J output: "WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources"
The problem does not occurr when adding 1 worker node or when I edit the job cluster via the UI. So I guess, there is an issue with the asset bundles.
I checked the releases of the CLI and for the newest release "v.0.221.1" it states in the release notes: "This releases fixes an issue introduced in v0.221.0 where managing jobs with a single-node cluster would fail."
Also strangely: Locally I have version "v.0.218.0" of the CLI installed and when I deploy the job locally the code runs until some intended exception but instead of failing the job, the job keeps on running and the same message: "WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources" gets written to the Log4J output.
Did anybody else experience this issue and knows how to solve that?