Databricks Community

virementz · ‎06-19-2024

i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 containers to the compute. Will attempt retry: false. Reason: Init script failure. Cluster scoped init script /EDH_ENIGMA/SQL_driver.sh failed: Script exit status is non-zero”.

However, for some reason, the script is working fine in PROD. For your information, there is no changes has been done to the cluster or the Init Script for the last 6 months. And we have been using this script and configuration since last year. Even the day before the failure start, the cluster is working fine. So, we want to check in with you if you have any idea why this is happening. Attached below is a few screenshot highlighting this issue:

Here are a few things we have tried:

Delete the script and recreating it again
Change the init script from bin/bash to sh script
Change the extension from .sh to .bash
Configure the script to sudo apt install something to include -y flag

This is the init script that we have been using:

#!/bin/bash

curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql17
apt-get update
ACCEPT_EULA=Y apt-get --quiet ---yes install msodbcsql17

Init Script failed in DEV

Same script working in PROD

Here is additonal info about our Databricks Version:

Policy : Unrestricted

Runtime Version : 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

Summary

1-2 Workers32-64 GB Memory8-16 Cores

1 Driver32 GB Memory, 8 Cores

Runtime13.3.x-scala2.12

jacovangelder · ‎06-19-2024

How did you recreate the script in DEV? Are you on a Windows machine? Might be worth checking the file's line endings in VSCode, to make sure they are LF (unix line endings). These issues usually occur because of this. I just tested your init script and it works fine for me.

virementz · ‎06-24-2024

i create the .sh file in Databricks workspace. Not using any windows machine or VS code for this. The execution was configured through cluster-scoped init script.

jacovangelder · ‎06-25-2024

Is the cluster configuration the same? i.e. Shared vs. Non-Isolation shared?
There must be a discrepancy (maybe in cluster permissions) somewhere.

Wojciech_BUK · ‎06-25-2024

Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ?
You can spin ALl purpose cluster and try testing connection with %sh magic command

Databricks Community

Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero

Summary

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences