Failed to install cluster scoped SparkR library
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2022 05:10 AM
Attempting to install SparkR to the cluster and successfully installed other packages such as tidyverse via CRAN. The error is copied below, any help you can provide is greatly appreciated!
Databricks runtime 10.4 LTS
Library installation attempted on the driver node of cluster 0527-101647-p8flqbiy and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R
Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8gumapnw.log
Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R
Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log
- Labels:
-
Error Message
-
Library
-
Sparkr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2022 05:23 AM
hi @Ross Hamilton this looks like the packages are failing to get installed. You can check the log to understand what is the issue.
I believe the failure is caused because of dependency missing. I would recommend checking for the dependent libraries and installing them.
You can try the below steps as well.
a) Install lib using command from notebook.
install.packages()
b) Copy installed lib to dbfs.
cp -R /local_disk/env /dbfs/path_to_r_library
c) Use init script to get installed libs in cluster lib path.
## Define contents of script
script = """
#!/bin/bash
R --vanilla <<EOF
system("cp -R /dbfs/path_to_r_library /databricks/spark/R/lib/", intern = T)
q()
EOF
"""
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2022 06:11 AM
Hi
I'm afraid the solution doesn't work:
install.packages("SparkR")
Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-1c6d7759-7751-4681-8489-a027452405f0’
(as ‘lib’ is unspecified)
Warning: package ‘SparkR’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
However if I go to load the library SparkR this works fine. I think it's the same issue when installing the package to the cluster.
library(SparkR)
Attaching package: ‘SparkR’
The following object is masked _by_ ‘.GlobalEnv’:
setLocalProperty
The following objects are masked from ‘package:stats’:
cov, filter, lag, na.omit, predict, sd, var, window
The following objects are masked from ‘package:base’....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2022 07:24 AM
Do you find any information in the logs?
Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2022 02:08 PM
Hi @Ross Hamilton ,
I believe SparkR comes inbuilt with Databricks RStudio and you don't have to install it explicitly. You can directly import it with library(SparkR) and it works for you from your above comment.
The error message you see could be red herring. Does this answer your question or am I missing something?

