<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Shorten Classic Cluster start up time in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150668#M53493</link>
    <description>&lt;P class="p1"&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182577"&gt;@NW1000&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="p1"&gt;Glad you tried my suggestion, and thanks for sharing the details.&lt;/P&gt;
&lt;P class="p1"&gt;1. Why the init script failed&lt;/P&gt;
&lt;P class="p1"&gt;This message:&lt;/P&gt;
&lt;P class="p4"&gt;&lt;EM&gt;Init script failure: Cluster scoped init script ... failed: Script exit status is non-zero&lt;/EM&gt;&lt;/P&gt;
&lt;P class="p1"&gt;really just means that something inside the bash script returned a non-zero exit code during cluster startup. In other words, the script hit an error and stopped.&lt;/P&gt;
&lt;P class="p1"&gt;The real clue will be in the init script log.&lt;/P&gt;
&lt;P class="p1"&gt;Here is where I would look:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Open the cluster details&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Go to the Event Log or driver logs&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Find the init script log file&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;For the script we were discussing, it should be something like:&lt;/P&gt;
&lt;P class="p4"&gt;/tmp/init-r-libs.log&lt;/P&gt;
&lt;P class="p1"&gt;Once you open that log, scroll to the bottom and look for the first real error message. That is usually where the root cause shows up.&lt;/P&gt;
&lt;P class="p1"&gt;In most cases, it tends to be one of these:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;a typo in a path, such as the Volume path or script path&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;missing execute permissions on the script, for example:&lt;/P&gt;
&lt;P class="p2"&gt;chmod +x init-script-RLib.sh&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;an R command inside the script failing, such as &lt;SPAN class="s1"&gt;install.packages()&lt;/SPAN&gt; returning an error, which will cause the whole script to exit non-zero&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;Once you have the last few lines from that log, it should be much easier to pinpoint exactly what failed and tighten up the script accordingly.&lt;/P&gt;
&lt;P class="p1"&gt;2. About the default CRAN / Posit Package Manager URL&lt;/P&gt;
&lt;P class="p1"&gt;Yes — the URL you are seeing in the Libraries UI, something like:&lt;/P&gt;
&lt;P class="p4"&gt;&lt;A href="https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/" target="_blank"&gt;https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/&lt;/A&gt;&lt;/P&gt;
&lt;P class="p1"&gt;is the Databricks-managed Posit Package Manager snapshot used by Databricks runtimes for R packages.&lt;/P&gt;
&lt;P class="p1"&gt;A few important things to know here:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Databricks pins R libraries to a specific CRAN snapshot, in this case &lt;SPAN class="s1"&gt;2025-03-20&lt;/SPAN&gt;, so installs remain reproducible and stable&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;The &lt;SPAN class="s1"&gt;__linux__/&amp;lt;codename&amp;gt;/2025-03-20&lt;/SPAN&gt; portion reflects the underlying Ubuntu release, such as &lt;SPAN class="s1"&gt;jammy&lt;/SPAN&gt; or &lt;SPAN class="s1"&gt;noble&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Databricks determines that automatically from the runtime OS for newer runtimes, including 17.x and above&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;That URL is intended to be used as the &lt;SPAN class="s1"&gt;repos=&lt;/SPAN&gt; value in &lt;SPAN class="s1"&gt;install.packages()&lt;/SPAN&gt;, not really as a browser-friendly page&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;So if you paste it into a browser and get something like “Invalid request,” that is not necessarily a problem — that can be expected behavior&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;If you want your own scripts to follow the same pattern across runtimes, the safest approach is to detect the OS codename dynamically and construct the URL from there, like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;release &amp;lt;- system("lsb_release -c --short", intern = TRUE)
snapshot_date &amp;lt;- "2025-03-20"

options(
  HTTPUserAgent = sprintf(
    "R/%s R (%s)",
    getRversion(),
    paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])
  ),
  repos = paste0(
    "https://databricks.packagemanager.posit.co/cran/__linux__/",
    release, "/", snapshot_date
  )
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;That way:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;if the runtime is on &lt;/SPAN&gt;jammy&lt;SPAN class="s1"&gt;, it uses &lt;/SPAN&gt;.../__linux__/jammy/2025-03-20/&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;if it is on &lt;/SPAN&gt;noble&lt;SPAN class="s1"&gt;, it uses &lt;/SPAN&gt;.../__linux__/noble/2025-03-20/&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;That mirrors how Databricks handles the default CRAN configuration internally.&lt;/P&gt;
&lt;P class="p1"&gt;Hope this helps, Louis.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;I can also make this a little shorter and more Community-post conversational if you want.&lt;/P&gt;</description>
    <pubDate>Thu, 12 Mar 2026 10:34:26 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2026-03-12T10:34:26Z</dc:date>
    <item>
      <title>Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150411#M53415</link>
      <description>&lt;P&gt;We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the package installation. I tried to pre-install the packages to a volume:&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Run this ONCE on a running cluster, then save the library path&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;lib_path &amp;lt;- &lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/datalake/test/rlib_cache"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;#dir.create(lib_path, recursive = TRUE, showWarnings = FALSE)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;packages &amp;lt;- c(&lt;/SPAN&gt;&lt;SPAN&gt;"mmrm"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"emmeans"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"striprtf"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"pandoc"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"glmmTMB"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"kableExtra"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"rtables"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;"tinytex"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"tern"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;install.packages(packages,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;lib = lib_path,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;repos = c(CRAN = &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;A href="https://packagemanager.posit.co/cran/__linux__/noble/2025-03-20" target="_blank"&gt;https://packagemanager.posit.co/cran/__linux__/noble/2025-03-20&lt;/A&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;HTTPUserAgent = sprintf(&lt;/SPAN&gt;&lt;SPAN&gt;"R/%s R (%s)"&lt;/SPAN&gt;&lt;SPAN&gt;, getRversion(),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;paste(getRversion(), R.version[&lt;/SPAN&gt;&lt;SPAN&gt;"platform"&lt;/SPAN&gt;&lt;SPAN&gt;], R.version[&lt;/SPAN&gt;&lt;SPAN&gt;"arch"&lt;/SPAN&gt;&lt;SPAN&gt;], R.version[&lt;/SPAN&gt;&lt;SPAN&gt;"os"&lt;/SPAN&gt;&lt;SPAN&gt;])),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Ncpus = parallel::detectCores())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Then set up .sh as init script for the classic cluster:&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;#!/bin/bash&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;set&lt;/SPAN&gt; &lt;SPAN&gt;-uo&lt;/SPAN&gt; &lt;SPAN&gt;pipefail&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;exec&lt;/SPAN&gt; &lt;SPAN&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;/tmp/init-r-libs.log&lt;/SPAN&gt; &lt;SPAN&gt;2&amp;gt;&amp;amp;1&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;echo&lt;/SPAN&gt; &lt;SPAN&gt;"=== R Library Init started at $(&lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt; &lt;SPAN&gt;-u&lt;/SPAN&gt;&lt;SPAN&gt;) ==="&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;CUSTOM_R_LIBS&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"/Volumes/datalake/test/rlib_cache"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Use Rprofile.site — this runs AFTER Databricks sets up its R environment&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# so the custom path will persist&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cat&lt;/SPAN&gt; &lt;SPAN&gt;&amp;lt;&amp;lt;&lt;/SPAN&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt; &lt;SPAN&gt;|&lt;/SPAN&gt; &lt;SPAN&gt;sudo&lt;/SPAN&gt; &lt;SPAN&gt;tee&lt;/SPAN&gt; &lt;SPAN&gt;-a&lt;/SPAN&gt; &lt;SPAN&gt;/usr/lib/R/etc/Rprofile.site&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# --- Custom R Library Path (added by init script) ---&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;local({&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;custom_lib &amp;lt;- "${&lt;/SPAN&gt;&lt;SPAN&gt;CUSTOM_R_LIBS&lt;/SPAN&gt;&lt;SPAN&gt;}"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;if (dir.exists(custom_lib)) {&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;.libPaths(c(custom_lib, .libPaths()))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;})&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;EOF&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;echo&lt;/SPAN&gt; &lt;SPAN&gt;"Custom R library path added to Rprofile.site: &lt;/SPAN&gt;&lt;SPAN&gt;$CUSTOM_R_LIBS&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;echo&lt;/SPAN&gt; &lt;SPAN&gt;"=== R Library Init completed at $(&lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt; &lt;SPAN&gt;-u&lt;/SPAN&gt;&lt;SPAN&gt;) ==="&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;But this cluster did not have the R packages installed. Failed to work.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Is there any way to shorten the cluster start up time? Thank you.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 20:46:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150411#M53415</guid>
      <dc:creator>NW1000</dc:creator>
      <dc:date>2026-03-09T20:46:06Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150413#M53417</link>
      <description>&lt;P class=""&gt;The reason your Volume-based cache isn't working is a credential scoping issue. Databricks only injects UC Volume credentials into init scripts that are themselves stored on a UC Volume. If your init script lives in workspace files or cloud storage, it can't actually read from /Volumes/datalake/test/rlib_cache at execution time — even though the path looks fine and your R code works in a running notebook.&lt;/P&gt;&lt;P class=""&gt;The fix: move your init script to the same Volume (e.g., /Volumes/datalake/test/scripts/init_r.sh). But I'd also change the strategy slightly. Instead of pointing .libPaths() at the Volume path, copy the packages to a local directory during init. Reading libraries over the FUSE mount adds noticeable latency on every library() call.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;#!/bin/bash cp -R /Volumes/datalake/test/rlib_cache/* /usr/local/lib/R/site-library/ 2&amp;gt;/dev/null&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;That copy takes maybe 10-30 seconds for your package set. Way better than 20 minutes of compilation.&lt;BR /&gt;&lt;BR /&gt;Hope this helps! If it helps, mark it as a solution!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 21:05:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150413#M53417</guid>
      <dc:creator>Kirankumarbs</dc:creator>
      <dc:date>2026-03-09T21:05:11Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150578#M53476</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182577"&gt;@NW1000&lt;/a&gt;&amp;nbsp;— good question, and your instinct to pre-compile and cache is the right one. There are three separate things working against you here, and fixing all three should collapse that 10-20 minute startup significantly.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;You're pulling source builds, not binaries.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Your PPM URL points at &lt;CODE&gt;__linux__/noble/...&lt;/CODE&gt;, but Databricks Runtimes (14.x, 15.x, 17.x) run on Ubuntu 22.04 (jammy). When PPM can't match the distro, it silently falls back to compiling from source — that's almost certainly where most of your time is going. Switch to:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;repos &amp;lt;- c(CRAN = "https://packagemanager.posit.co/cran/__linux__/jammy/latest")
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Precompiled binaries should cut install time from minutes to seconds per package.&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;Direct install to a Volume path isn't reliable.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;CODE&gt;install.packages(..., lib = "/Volumes/...")&lt;/CODE&gt; doesn't behave consistently across driver and workers. The documented pattern is two steps — install to the default library location first, then copy to the Volume:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-r"&gt;# Run once on a "build" cluster with the same DBR you'll use in production
pkgs &amp;lt;- c("mmrm","emmeans","striprtf","pandoc",
          "glmmTMB","kableExtra","rtables",
          "tinytex","tern")
repos &amp;lt;- c(CRAN = "https://packagemanager.posit.co/cran/__linux__/jammy/latest")
install.packages(pkgs, repos = repos)

# Then copy to Volume
volume_pkgs &amp;lt;- "/Volumes/&amp;lt;catalog&amp;gt;/&amp;lt;schema&amp;gt;/&amp;lt;volume&amp;gt;/r_libs"
dir.create(volume_pkgs, recursive = TRUE, showWarnings = FALSE)
sapply(pkgs, function(p) {
  file.copy(from = find.package(p),
            to   = volume_pkgs,
            recursive = TRUE)
})
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;Rprofile.site is the wrong config file.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Databricks controls R library paths through environment variables (R_LIBS_USER, etc.) set in &lt;CODE&gt;/etc/R/Renviron.site&lt;/CODE&gt;. Changes in &lt;CODE&gt;Rprofile.site&lt;/CODE&gt; can get overridden by the Databricks startup chain, which is likely why your packages didn't show up at runtime. Your init script should modify &lt;CODE&gt;Renviron.site&lt;/CODE&gt; instead:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-bash"&gt;#!/bin/bash
set -euxo pipefail

VOLUME_PKGS="/Volumes/&amp;lt;catalog&amp;gt;/&amp;lt;schema&amp;gt;/&amp;lt;volume&amp;gt;/r_libs"

cat &amp;lt;&amp;lt;EOF &amp;gt;&amp;gt; /etc/R/Renviron.site
R_LIBS_USER=%U:/databricks/spark/R/lib:/local_disk0/.ephemeral_nfs/cluster_libraries/r:$VOLUME_PKGS
EOF
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This ensures &lt;CODE&gt;.libPaths()&lt;/CODE&gt; picks up your cached packages automatically when R starts.&lt;/P&gt;
&lt;P&gt;Quick win for the PyPI packages: use the cluster Libraries tab directly (Compute → your cluster → Libraries → Install New → PyPI). Those get stored in the cluster-libraries location and don't need init scripts or notebooks.&lt;/P&gt;
&lt;P&gt;Once you've made these changes, verify on a fresh R notebook:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class="language-r"&gt;.libPaths()
installed.packages()[, "Package"]
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Confirm the Volume path shows up and your packages are found there. Also check the init script log on the driver to make sure it ran cleanly.&lt;/P&gt;
&lt;P&gt;Worth flagging — the distro fix alone (jammy vs. noble) might solve 80% of this even without the caching layer. The caching on top of that should get your cold starts well under 2 minutes.&lt;/P&gt;
&lt;P&gt;Hope that helps. Let us know how it goes.&lt;/P&gt;
&lt;P&gt;Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2026 11:09:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150578#M53476</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-03-11T11:09:33Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150640#M53487</link>
      <description>&lt;P&gt;Thanks a lot, Louis! I tried your method, saved the init script into another volume.&amp;nbsp;&lt;SPAN&gt;But the cluster failed to start WITH error as: Init script failure:Cluster scoped init script /Volumes/datalake/test/utility_rlib_init_script/init-script-RLib.sh failed: Script exit status is non-zero. Why did fail?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I have another question: is the default CRAN package installation from the Library section of a classic cluster is&amp;nbsp;&lt;A href="https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/" target="_blank"&gt;https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/&lt;/A&gt;&amp;nbsp;? Could we assume we can use&amp;nbsp;&lt;A href="https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/" target="_blank"&gt;https://databricks.packagemanager.posit.co/cran/__linux__/jammy/2025-03-20/?&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 01:47:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150640#M53487</guid>
      <dc:creator>NW1000</dc:creator>
      <dc:date>2026-03-12T01:47:51Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150668#M53493</link>
      <description>&lt;P class="p1"&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182577"&gt;@NW1000&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="p1"&gt;Glad you tried my suggestion, and thanks for sharing the details.&lt;/P&gt;
&lt;P class="p1"&gt;1. Why the init script failed&lt;/P&gt;
&lt;P class="p1"&gt;This message:&lt;/P&gt;
&lt;P class="p4"&gt;&lt;EM&gt;Init script failure: Cluster scoped init script ... failed: Script exit status is non-zero&lt;/EM&gt;&lt;/P&gt;
&lt;P class="p1"&gt;really just means that something inside the bash script returned a non-zero exit code during cluster startup. In other words, the script hit an error and stopped.&lt;/P&gt;
&lt;P class="p1"&gt;The real clue will be in the init script log.&lt;/P&gt;
&lt;P class="p1"&gt;Here is where I would look:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Open the cluster details&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Go to the Event Log or driver logs&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Find the init script log file&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;For the script we were discussing, it should be something like:&lt;/P&gt;
&lt;P class="p4"&gt;/tmp/init-r-libs.log&lt;/P&gt;
&lt;P class="p1"&gt;Once you open that log, scroll to the bottom and look for the first real error message. That is usually where the root cause shows up.&lt;/P&gt;
&lt;P class="p1"&gt;In most cases, it tends to be one of these:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;a typo in a path, such as the Volume path or script path&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;missing execute permissions on the script, for example:&lt;/P&gt;
&lt;P class="p2"&gt;chmod +x init-script-RLib.sh&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;an R command inside the script failing, such as &lt;SPAN class="s1"&gt;install.packages()&lt;/SPAN&gt; returning an error, which will cause the whole script to exit non-zero&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;Once you have the last few lines from that log, it should be much easier to pinpoint exactly what failed and tighten up the script accordingly.&lt;/P&gt;
&lt;P class="p1"&gt;2. About the default CRAN / Posit Package Manager URL&lt;/P&gt;
&lt;P class="p1"&gt;Yes — the URL you are seeing in the Libraries UI, something like:&lt;/P&gt;
&lt;P class="p4"&gt;&lt;A href="https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/" target="_blank"&gt;https://databricks.packagemanager.posit.co/cran/__linux__/noble/2025-03-20/&lt;/A&gt;&lt;/P&gt;
&lt;P class="p1"&gt;is the Databricks-managed Posit Package Manager snapshot used by Databricks runtimes for R packages.&lt;/P&gt;
&lt;P class="p1"&gt;A few important things to know here:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Databricks pins R libraries to a specific CRAN snapshot, in this case &lt;SPAN class="s1"&gt;2025-03-20&lt;/SPAN&gt;, so installs remain reproducible and stable&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;The &lt;SPAN class="s1"&gt;__linux__/&amp;lt;codename&amp;gt;/2025-03-20&lt;/SPAN&gt; portion reflects the underlying Ubuntu release, such as &lt;SPAN class="s1"&gt;jammy&lt;/SPAN&gt; or &lt;SPAN class="s1"&gt;noble&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;Databricks determines that automatically from the runtime OS for newer runtimes, including 17.x and above&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;That URL is intended to be used as the &lt;SPAN class="s1"&gt;repos=&lt;/SPAN&gt; value in &lt;SPAN class="s1"&gt;install.packages()&lt;/SPAN&gt;, not really as a browser-friendly page&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;So if you paste it into a browser and get something like “Invalid request,” that is not necessarily a problem — that can be expected behavior&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p3"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;If you want your own scripts to follow the same pattern across runtimes, the safest approach is to detect the OS codename dynamically and construct the URL from there, like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;release &amp;lt;- system("lsb_release -c --short", intern = TRUE)
snapshot_date &amp;lt;- "2025-03-20"

options(
  HTTPUserAgent = sprintf(
    "R/%s R (%s)",
    getRversion(),
    paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])
  ),
  repos = paste0(
    "https://databricks.packagemanager.posit.co/cran/__linux__/",
    release, "/", snapshot_date
  )
)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;That way:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;if the runtime is on &lt;/SPAN&gt;jammy&lt;SPAN class="s1"&gt;, it uses &lt;/SPAN&gt;.../__linux__/jammy/2025-03-20/&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;if it is on &lt;/SPAN&gt;noble&lt;SPAN class="s1"&gt;, it uses &lt;/SPAN&gt;.../__linux__/noble/2025-03-20/&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p1"&gt;That mirrors how Databricks handles the default CRAN configuration internally.&lt;/P&gt;
&lt;P class="p1"&gt;Hope this helps, Louis.&lt;/P&gt;
&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="p1"&gt;I can also make this a little shorter and more Community-post conversational if you want.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 10:34:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150668#M53493</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-03-12T10:34:26Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150713#M53502</link>
      <description>&lt;P&gt;Hi Louis,&lt;/P&gt;&lt;P&gt;Thanks a lot for the great advice!&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used 17.3LTS ML Runtime for this classic cluster. With the code you gave, it showed "noble". Does it mean I should use 'noble' in the library installation?&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2026-03-12 at 10.49.51 AM.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/24760iBCEADF91A3127103/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2026-03-12 at 10.49.51 AM.png" alt="Screenshot 2026-03-12 at 10.49.51 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2026 14:54:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/150713#M53502</guid>
      <dc:creator>NW1000</dc:creator>
      <dc:date>2026-03-12T14:54:30Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten Classic Cluster start up time</title>
      <link>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/153654#M53990</link>
      <description>&lt;P&gt;Thanks for this detail so far. New to this thread but facing similar problem as&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/182577"&gt;@NW1000&lt;/a&gt;&amp;nbsp;.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a few big R dependencies (messy combo of Matrix, lme4, rstanarm). It used to take 10 minutes to install from jammy after starting a cluster but now taking 30 (haven't sorted out why though).&lt;/P&gt;&lt;P&gt;I followed your recs, copied the packages to Volumes, and then made the new init file for Renivorn.site. But the cluster fails, like it does with NW1000. Gemini is suggesting it's because the cluster doesn't have access Volumes at this point in the boot, but I'm still struggling to get it all fixed. I'm also struggling to find the log for the init in the Driver Logs.&lt;/P&gt;&lt;P&gt;Are there any other potential fixes I can explore?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Apr 2026 19:57:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/shorten-classic-cluster-start-up-time/m-p/153654#M53990</guid>
      <dc:creator>RyanTImpe</dc:creator>
      <dc:date>2026-04-07T19:57:34Z</dc:date>
    </item>
  </channel>
</rss>

