โ07-22-2025 08:21 AM
Hi All,
Recently all of my databricks jobs are failing due to error:
Run failed with error message Unexpected failure while waiting for the cluster (xxxx-xxxxxx-xxxxxxxx) to be ready: Cluster '(xxxx-xxxxxx-xxxxxxxx) is unhealthy.
it was all running till 21st july but started failing on 22nd july both in dev and prod env
โ07-24-2025 01:50 AM
Hi @vishalv4476 ,
Could you try adding the -y flag to ensure non-interactive installation: sudo apt -y install jq || echo 'Warning: Failed to install jq'
โ07-23-2025 04:00 AM
Hello @vishalv4476!
Could you please check the Event Log details for the failed clusters? They can help identify if the issue is related to the driver, network, libraries, or any init scripts. It's also possible the problem is due to changes made between these two dates.
โ07-23-2025 04:39 AM
Hi,
Issue potentially is with init scripts , it starts and run for 50 mins and terminates the clusters to execute the jobs
Global init script we are using is :
#!/bin/bash
# To diable immediate exit or errors
set +e
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash || echo "Warning: Azure CLI installation failed."
sudo apt install jq || echo "Warning: Failed to install jq."
export resource_id=$(curl -s -H Metadata:true --noproxy "*" "some-api-xxx-xxx" | jq -r .compute.resourceId) || echo "Warning: Failed to fetch resource ID or resource ID is empty."
export msi_id="xxx-xxx-xxx-xxx"
export dcr_id="/subscriptions/xxxx-xxxx-xxxx-xxxx/resourceGroups/xx-xx-xx-xx/providers/Microsoft.Insights/dataCollectionRules/xx-xx-xx-xx"
az login --identity --username ${msi_id} || echo "Warning: Azure CLI login failed."
echo "Installing AMA for VM-${resource_id}"
az vm extension set --name AzureMonitorLinuxAgent --publisher Microsoft.Azure.Monitor --ids $resource_id --enable-auto-upgrade true || echo "Warning: Failed to install Azure Monitor Agent."
echo "Enabling DCRA for VM-${resource_id}"
az monitor data-collection rule association create --name "VM-4" --rule-id $dcr_id --resource $resource_id || echo "Warning: Failed to enable DCRA for VM-${resource_id}."
echo "Global init script completed."
exit 0
โ07-23-2025 04:45 AM
Event logs from job run:
โ07-24-2025 01:50 AM
Hi @vishalv4476 ,
Could you try adding the -y flag to ensure non-interactive installation: sudo apt -y install jq || echo 'Warning: Failed to install jq'
โ07-24-2025 02:53 AM
My bad , just a silly thing to miss out. Really appreciate your help @SP_6721
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now