cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Arrow R package fails to install

apw
New Contributor II
# Databricks notebook source
.libPaths()
 
# COMMAND ----------
 
dir("/databricks/spark/R/lib")
 
# COMMAND ----------
 
## Add current working directory to library paths
.libPaths(c(getwd(), .libPaths()))
 
# COMMAND ----------
 
## The latest versions from CRAN
install.packages(c('arrow', 'tidyverse', 'aws.s3', 'sparklyr', 'cluster', 'sqldf', 'lubridate', 'ChannelAttribution'), repos = "http://cran.us.r-project.org")
 
# COMMAND ----------
 
dir("/tmp/Rserv/conn970")
 
# COMMAND ----------
 
## Copy from driver to DBFS
system("cp -R /tmp/Rserv/conn970 /usr/lib/R/site-library")
 
# COMMAND ----------
 
dir("/usr/lib/R/site-library")
 
# COMMAND ----------
 
## Copy from driver to DBFS
system("cp -R /tmp/Rserv/conn970 /dbfs/r-libraries")
 
# COMMAND ----------
 
dir("/dbfs/r-libraries")
# COMMAND ----------
 
# Add packages to libPaths
.libPaths("/dbfs/r-libraries")
 
# COMMAND ----------
 
# Check that the dbfs libraries are in libPath
.libPaths()

About 6 weeks ago (early April 2022), I had tested a workflow to ensure that I could trigger jobs on databricks remotely from Airflow, which was successful.

As part of the process the workflow activates a pre-built compute, it then loads various R libraries from DBFS into the compute, one of the packages is 'arrow', however while all the other packages load without issue this package fails to load successfully and then causes my workflow to crash.

When I look into the workflow I get the following error 'DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Command to install library [RCranPkgId(arrow,None,None)] on [0303-130414-840hkwxf] orgId [5132544506122561] failed inside Databricks infrastructure', see image below. Arrow Fail Message" data-fileid="0698Y00000JFZosQAHI have triggered workflow directly inside of databricks and still get the same problem, so clearly it does not have an airflow related cause.

I tried to delete the arrow package from dbfs to see if I could run the test without it, but everytime I delete it, it returns when I retry the workflow.

I then checked CRAN to see if arrow had been updated recently, it was on the 2022-05-09, so I loaded an older version instead, (having first deleted everything relating to it from dbfs), this didn't work either, see images attached.

I have also attached the script I'm using in R to load the packages to dbfs, which I think is good as every other package load properly, however it may be of use in understanding what I am doing or why the error occurs.

What I'd like to know is:

  1. Do you see anything that I might be doing incorrectly inside my attached libraries script?
  2. Is there an issue with loading the arrow package and if so do you have a work around that can prevent the failure?
  3. Why does dbfs continue to re-install arrow despite my removal of it from the directory?
  4. Can I permanently remove the arrow package from dbfs without it returning everytime I trigger the workflow?

Many thanks in advance for any help you guys can offer.

2 REPLIES 2

Atanu
Esteemed Contributor
Esteemed Contributor

@Anthony McGrath​ can you please download and upload to DBFS and see if the issue still persists?You can check if any global initscript is reinstalling this to your cluster.

Kaniz
Community Manager
Community Manager

Hi @Anthony McGrath​ ​, We haven’t heard from you on the last response from @Atanu Sarkar​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.