cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Failed to install cluster scoped SparkR library

Ross
New Contributor II

Attempting to install SparkR to the cluster and successfully installed other packages such as tidyverse via CRAN. The error is copied below, any help you can provide is greatly appreciated!

Databricks runtime 10.4 LTS

Library installation attempted on the driver node of cluster 0527-101647-p8flqbiy and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8gumapnw.log

Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log

4 REPLIES 4

Prabakar
Databricks Employee
Databricks Employee

hi @Ross Hamilton​ this looks like the packages are failing to get installed. You can check the log to understand what is the issue.

I believe the failure is caused because of dependency missing. I would recommend checking for the dependent libraries and installing them.

You can try the below steps as well.

a) Install lib using command from notebook.

install.packages()

b) Copy installed lib to dbfs.

cp -R /local_disk/env /dbfs/path_to_r_library

c) Use init script to get installed libs in cluster lib path.

## Define contents of script
script = """
#!/bin/bash
R --vanilla <<EOF
system("cp -R /dbfs/path_to_r_library /databricks/spark/R/lib/", intern = T)
q()
EOF
"""

Ross
New Contributor II

Hi

I'm afraid the solution doesn't work:

install.packages("SparkR")
 
Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-1c6d7759-7751-4681-8489-a027452405f0’
(as ‘lib’ is unspecified)
Warning: package ‘SparkR’ is not available for this version of R
 
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

However if I go to load the library SparkR this works fine. I think it's the same issue when installing the package to the cluster.

library(SparkR)
 
Attaching package: ‘SparkR’
 
The following object is masked _by_ ‘.GlobalEnv’:
 
    setLocalProperty
 
The following objects are masked from ‘package:stats’:
 
    cov, filter, lag, na.omit, predict, sd, var, window
 
The following objects are masked from ‘package:base’....

Prabakar
Databricks Employee
Databricks Employee

Do you find any information in the logs?

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log

Vivian_Wilfred
Databricks Employee
Databricks Employee

Hi @Ross Hamilton​ ,

I believe SparkR comes inbuilt with Databricks RStudio and you don't have to install it explicitly. You can directly import it with library(SparkR) and it works for you from your above comment.

imageThe error message you see could be red herring. Does this answer your question or am I missing something?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group