cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster Failed to Start - Cluster scoped init scrip failed: Script exit status is non-zero

virementz
New Contributor II

i have been using cluster scoped init script for around 1 year already and everything is working fine. But suddenly, Databricks cluster has failed to restart since last week Thursday (13th June 2024). It returns this error:” Failed to add 2 containers to the compute. Will attempt retry: false. Reason: Init script failure. Cluster scoped init script /EDH_ENIGMA/SQL_driver.sh failed: Script exit status is non-zero”.

 

However, for some reason, the script is working fine in PROD. For your information, there is no changes has been done to the cluster or the Init Script for the last 6 months. And we have been using this script and configuration since last year. Even the day before the failure start, the cluster is working fine. So, we want to check in with you if you have any idea why this is happening. Attached below is a few screenshot highlighting this issue:

Here are a few things we have tried:

  1. Delete the script and recreating it again
  2. Change the init script from bin/bash to sh script
  3. Change the extension from .sh to .bash
  4. Configure the script to sudo apt install something to include -y flag

 

This is the init script that we have been using:

 

#!/bin/bash

curl --silent https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
# Install msodbcsql17
apt-get update
ACCEPT_EULA=Y apt-get --quiet ---yes install msodbcsql17

 

 

 
Init Script failed in DEV

virementz_0-1718854375979.png

Same script working in PROD

virementz_1-1718854446857.png

Here is additonal info about our Databricks Version:

Policy : Unrestricted

Runtime Version : 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

Summary

1-2 Workers32-64 GB Memory8-16 Cores
1 Driver32 GB Memory, 8 Cores
Runtime13.3.x-scala2.12

 

4 REPLIES 4

jacovangelder
Honored Contributor

How did you recreate the script in DEV? Are you on a Windows machine? Might be worth checking the file's line endings in VSCode, to make sure they are LF (unix line endings). These issues usually occur because of this. I just tested your init script and it works fine for me. 

i create the .sh file in Databricks workspace. Not using any windows machine or VS code for this. The execution was configured through cluster-scoped init script. 

Is the cluster configuration the same? i.e. Shared vs. Non-Isolation shared?
There must be a discrepancy (maybe in cluster permissions) somewhere. 

Wojciech_BUK
Valued Contributor III

Just maybe - there is no outbound connection on DEV from Cluster VNET to URL you are trying to get ? 
You can spin ALl purpose cluster and try testing connection with %sh magic command

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group