Shorten Classic Cluster start up time

NW1000 · ‎03-09-2026

We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the package installation. I tried to pre-install the packages to a volume:

# Run this ONCE on a running cluster, then save the library path

lib_path <- "/Volumes/datalake/test/rlib_cache"

#dir.create(lib_path, recursive = TRUE, showWarnings = FALSE)

packages <- c("mmrm", "emmeans", "striprtf", "pandoc",

"glmmTMB", "kableExtra", "rtables",

"tinytex", "tern")

install.packages(packages,

lib = lib_path,

repos = c(CRAN = "https://packagemanager.posit.co/cran/__linux__/noble/2025-03-20"),

HTTPUserAgent = sprintf("R/%s R (%s)", getRversion(),

paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])),

Ncpus = parallel::detectCores())

Then set up .sh as init script for the classic cluster:

#!/bin/bash

set -uo pipefail

exec > /tmp/init-r-libs.log 2>&1

echo "=== R Library Init started at $(date -u) ==="

CUSTOM_R_LIBS="/Volumes/datalake/test/rlib_cache"

# Use Rprofile.site — this runs AFTER Databricks sets up its R environment

# so the custom path will persist

cat <<EOF | sudo tee -a /usr/lib/R/etc/Rprofile.site

# --- Custom R Library Path (added by init script) ---

local({

custom_lib <- "${CUSTOM_R_LIBS}"

if (dir.exists(custom_lib)) {

.libPaths(c(custom_lib, .libPaths()))

}

})

EOF

echo "Custom R library path added to Rprofile.site: $CUSTOM_R_LIBS"

echo "=== R Library Init completed at $(date -u) ==="

But this cluster did not have the R packages installed. Failed to work.

Is there any way to shorten the cluster start up time? Thank you.