cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Init Script Fails Intermittently on Workflow Job

leungi
New Contributor III

An init script is used to install system libraries, per below.

Adding the script to a Personal Compute consistently works. The same script is added to a Workflows job via cluster config, which intermittently fails, as shown in error message below.

Both Personal and Workflow clusters are on 14.3 LTS runtime; surprised with the instability of the latter.

Any troubleshooting advice is appreciated.

Init Script

#!/bin/bash
set -euxo pipefail
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
fi

Error

leungi_0-1718291897408.png

 

1 ACCEPTED SOLUTION

Accepted Solutions

leungi
New Contributor III

Thanks for the suggestion @amr.

Courtesy of a DBX solution engineer, the key was to remove all the files in the /var/lib/apt/lists/ directory to force apt to download fresh package lists during subsequent update.

Init Script

#!/bin/bash
set -euxo pipefail
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  # --- Clear cache
  rm -r /var/cache/apt/archives/* /var/lib/apt/lists/*
  sudo apt-get clean -y
  sudo apt-get update -y
  #---
  sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
fi
 

View solution in original post

2 REPLIES 2

amr
Valued Contributor
Valued Contributor

Check the cluster event log to see if there is a clue why the script is failing. if the script failed and returned none zero status the cluster wont start

leungi
New Contributor III

Thanks for the suggestion @amr.

Courtesy of a DBX solution engineer, the key was to remove all the files in the /var/lib/apt/lists/ directory to force apt to download fresh package lists during subsequent update.

Init Script

#!/bin/bash
set -euxo pipefail
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  # --- Clear cache
  rm -r /var/cache/apt/archives/* /var/lib/apt/lists/*
  sudo apt-get clean -y
  sudo apt-get update -y
  #---
  sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
fi
 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!