Init Script Fails Intermittently on Workflow Job

leungi
Contributor

An init script is used to install system libraries, per below.

Adding the script to a Personal Compute consistently works. The same script is added to a Workflows job via cluster config, which intermittently fails, as shown in error message below.

Both Personal and Workflow clusters are on 14.3 LTS runtime; surprised with the instability of the latter.

Any troubleshooting advice is appreciated.

Init Script

#!/bin/bash
set -euxo pipefail
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
fi

Error

leungi_0-1718291897408.png

 

amr
Databricks Employee
Databricks Employee

Check the cluster event log to see if there is a clue why the script is failing. if the script failed and returned none zero status the cluster wont start

Thanks for the suggestion @amr.

Courtesy of a DBX solution engineer, the key was to remove all the files in the /var/lib/apt/lists/ directory to force apt to download fresh package lists during subsequent update.

Init Script

#!/bin/bash
set -euxo pipefail
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  # --- Clear cache
  rm -r /var/cache/apt/archives/* /var/lib/apt/lists/*
  sudo apt-get clean -y
  sudo apt-get update -y
  #---
  sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
fi
 

View solution in original post