cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Failed to install cluster scoped SparkR library

Ross
New Contributor II

Attempting to install SparkR to the cluster and successfully installed other packages such as tidyverse via CRAN. The error is copied below, any help you can provide is greatly appreciated!

Databricks runtime 10.4 LTS

Library installation attempted on the driver node of cluster 0527-101647-p8flqbiy and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8gumapnw.log

Error: Error installing R package: Could not install package with error: package ‘SparkR’ is not available for this version of R

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log

4 REPLIES 4

Prabakar
Esteemed Contributor III
Esteemed Contributor III

hi @Ross Hamilton​ this looks like the packages are failing to get installed. You can check the log to understand what is the issue.

I believe the failure is caused because of dependency missing. I would recommend checking for the dependent libraries and installing them.

You can try the below steps as well.

a) Install lib using command from notebook.

install.packages()

b) Copy installed lib to dbfs.

cp -R /local_disk/env /dbfs/path_to_r_library

c) Use init script to get installed libs in cluster lib path.

## Define contents of script
script = """
#!/bin/bash
R --vanilla <<EOF
system("cp -R /dbfs/path_to_r_library /databricks/spark/R/lib/", intern = T)
q()
EOF
"""

Ross
New Contributor II

Hi

I'm afraid the solution doesn't work:

install.packages("SparkR")
 
Installing package into ‘/local_disk0/.ephemeral_nfs/envs/rEnv-1c6d7759-7751-4681-8489-a027452405f0’
(as ‘lib’ is unspecified)
Warning: package ‘SparkR’ is not available for this version of R
 
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

However if I go to load the library SparkR this works fine. I think it's the same issue when installing the package to the cluster.

library(SparkR)
 
Attaching package: ‘SparkR’
 
The following object is masked _by_ ‘.GlobalEnv’:
 
    setLocalProperty
 
The following objects are masked from ‘package:stats’:
 
    cov, filter, lag, na.omit, predict, sd, var, window
 
The following objects are masked from ‘package:base’....

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Do you find any information in the logs?

Full error log available at /databricks/driver/library-install-logs/r-package-installation-2022-08-04T12:04:30Z-8vp_gpk1.log

Vivian_Wilfred
Honored Contributor
Honored Contributor

Hi @Ross Hamilton​ ,

I believe SparkR comes inbuilt with Databricks RStudio and you don't have to install it explicitly. You can directly import it with library(SparkR) and it works for you from your above comment.

imageThe error message you see could be red herring. Does this answer your question or am I missing something?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.