Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2024 08:40 PM
i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 containers to the compute. Will attempt retry: false. Reason: Init script failure. Cluster scoped init script /EDH_ENIGMA/SQL_driver.sh failed: Script exit status is non-zero”.
However, for some reason, the script is working fine in PROD. For your information, there is no changes has been done to the cluster or the Init Script for the last 6 months. And we have been using this script and configuration since last year. Even the day before the failure start, the cluster is working fine. So, we want to check in with you if you have any idea why this is happening. Attached below is a few screenshot highlighting this issue:
Here are a few things we have tried:
- Delete the script and recreating it again
- Change the init script from bin/bash to sh script
- Change the extension from .sh to .bash
- Configure the script to sudo apt install something to include -y flag
This is the init script that we have been using:
#!/bin/bash
curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql17
apt-get update
ACCEPT_EULA=Y apt-get --quiet ---yes install msodbcsql17
Same script working in PROD
Here is additonal info about our Databricks Version:
Policy : Unrestricted
Runtime Version : 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
Summary
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2024 10:56 PM
How did you recreate the script in DEV? Are you on a Windows machine? Might be worth checking the file's line endings in VSCode, to make sure they are LF (unix line endings). These issues usually occur because of this. I just tested your init script and it works fine for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-24-2024 07:44 PM
i create the .sh file in Databricks workspace. Not using any windows machine or VS code for this. The execution was configured through cluster-scoped init script.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2024 12:01 AM
Is the cluster configuration the same? i.e. Shared vs. Non-Isolation shared?
There must be a discrepancy (maybe in cluster permissions) somewhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2024 12:14 AM
Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ?
You can spin ALl purpose cluster and try testing connection with %sh magic command