i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:โ Failed to add 2 containers to the compute. Will attempt retry: false. Reason: Init script failure. Cluster scoped init script /EDH_ENIGMA/SQL_driver.sh failed: Script exit status is non-zeroโ.
However, for some reason, the script is working fine in PROD. For your information, there is no changes has been done to the cluster or the Init Script for the last 6 months. And we have been using this script and configuration since last year. Even the day before the failure start, the cluster is working fine. So, we want to check in with you if you have any idea why this is happening. Attached below is a few screenshot highlighting this issue:
Here are a few things we have tried:
- Delete the script and recreating it again
- Change the init script from bin/bash to sh script
- Change the extension from .sh to .bash
- Configure the script to sudo apt install something to include -y flag
This is the init script that we have been using:
#!/bin/bash
curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql17
apt-get update
ACCEPT_EULA=Y apt-get --quiet ---yes install msodbcsql17
Init Script failed in DEV
Same script working in PROD
Here is additonal info about our Databricks Version:
Policy : Unrestricted
Runtime Version : 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
Summary
1-2 Workers32-64 GB Memory8-16 Cores
1 Driver32 GB Memory, 8 Cores
Runtime13.3.x-scala2.12