Shorten Classic Cluster start up time
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-09-2026 01:46 PM
We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the package installation. I tried to pre-install the packages to a volume:
# Run this ONCE on a running cluster, then save the library path
lib_path <- "/Volumes/datalake/test/rlib_cache"
#dir.create(lib_path, recursive = TRUE, showWarnings = FALSE)
packages <- c("mmrm", "emmeans", "striprtf", "pandoc",
"glmmTMB", "kableExtra", "rtables",
"tinytex", "tern")
install.packages(packages,
lib = lib_path,
HTTPUserAgent = sprintf("R/%s R (%s)", getRversion(),
paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])),
Ncpus = parallel::detectCores())
Then set up .sh as init script for the classic cluster:
#!/bin/bash
set -uo pipefail
exec > /tmp/init-r-libs.log 2>&1
echo "=== R Library Init started at $(date -u) ==="
CUSTOM_R_LIBS="/Volumes/datalake/test/rlib_cache"
# Use Rprofile.site — this runs AFTER Databricks sets up its R environment
# so the custom path will persist
cat <<EOF | sudo tee -a /usr/lib/R/etc/Rprofile.site
# --- Custom R Library Path (added by init script) ---
local({
custom_lib <- "${CUSTOM_R_LIBS}"
if (dir.exists(custom_lib)) {
.libPaths(c(custom_lib, .libPaths()))
}
})
EOF
echo "Custom R library path added to Rprofile.site: $CUSTOM_R_LIBS"
echo "=== R Library Init completed at $(date -u) ==="
But this cluster did not have the R packages installed. Failed to work.
Is there any way to shorten the cluster start up time? Thank you.