cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

R packages not getting installed on cluster when creating cluster from dockerfile

nav
New Contributor II

I'm trying to use dockerfile to create a cluster which has Robyn (https://facebookexperimental.github.io/Robyn/) and other R libraries installed. But it is failing to install the R libraries to the cluster. When I run the container in interactive mode, I can see R libraries.

How can I use dockerfile to create cluster with these R libraries installed on the cluster?

Thank you

Attachments:

  1. screenshot of dockerfile
  2. driver log error
8 REPLIES 8

Debayan
Databricks Employee
Databricks Employee

Hi, The error looks like it is not able to locate one package, could you please reverify if the package name and the address to the package is valid?

Also please tag @Debayan Mukherjee​ with your next response which will notify me, Thank you!

nav
New Contributor II

@Debayan Mukherjee​ Thank you for your reply. Package name and the address is valid. I can see the package version when I run container in interactive mode. But none the R packages are getting installed on cluster when I use docker image to create the cluster. Am I missing some code in dockerfile?

nav
New Contributor II

@Debayan Mukherjee​ Hi, wanted to follow up on this. Please let me know if you need any more information from my side.

Debayan
Databricks Employee
Databricks Employee

Hi @Navneet Sonak​ Sorry for the dela!, we would also like know how the docker image was created? There can be a possibility something is missing the docker image code. Also, is it working with the default DBR cluster?

nav
New Contributor II

Hi @Debayan Mukherjee​ docker image is created using an argo workflow. I used this dockerfile as reference: https://github.com/databricks/containers/blob/master/ubuntu/R/Dockerfile. I'm not sure I follow you 2nd question. Cluster is getting created fine, it is that they are missing all the R packages which should get installed on them bc of dockerfile.

Here's my dockerfile code:

FROM databricksruntime/standard:10.4-LTS

# Suppress interactive configuration prompts

ENV DEBIAN_FRONTEND=noninteractive

ENV DOWNLOAD_STATIC_LIBV8=1

ENV TZ=America/New_York

# install dependencies

RUN apt-get update \

  && apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \

  && add-apt-repository -y 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' \  

  && apt-get install build-essential --yes \

   dirmngr gnupg apt-transport-https ca-certificates software-properties-common \

   autoconf \

   automake \

   g++ \

   gcc \

   cmake \

   gfortran \

   make \

   nano \

   liblapack-dev \

   liblapack3 \

   libopenblas-base \

   libopenblas-dev \

   libcurl4-openssl-dev\

   libxml2-dev\

   libssl-dev\

   libnlopt-dev \

   r-base \

   r-base-dev \

  && apt-get clean all \

  && rm -rf /var/lib/apt/lists/*

RUN R -e "install.packages(c('remotes', 'shiny'), repos='https://cran.microsoft.com/')" 

#RUN R -e "remotes::install_github('facebookexperimental/Robyn/R');"

RUN R -e "install.packages('Robyn')"

RUN R -e "library(Robyn)"

# # DBI/ODBC dependencies

RUN R -e "install.packages(c('DBI', 'dplyr','dbplyr','odbc'), repos='https://cran.microsoft.com/')"

# # Databricks dependencies

# # hwriterPlus is used by Databricks to display output in notebook cells

# # Rserve allows Spark to communicate with a local R process to run R code

RUN R -e "install.packages(c('hwriterPlus'), repos='https://mran.revolutionanalytics.com/snapshot/2017-02-26')"

RUN R -e "install.packages(c('htmltools'), repos='https://cran.microsoft.com/')"

RUN R -e "install.packages('Rserve', repos='http://rforge.net/')"

RUN R -e "install.packages('reticulate');"

RUN R -e "library(reticulate)"

# ## Install Nevergrad

# # RUN R -e "reticulate::use_python('/opt/conda/bin/python3')"

# # RUN R -e "reticulate::py_config()"

# # RUN R -e "reticulate::py_install('nevergrad', pip = TRUE)"

RUN /databricks/python3/bin/pip install nevergrad

Debayan
Databricks Employee
Databricks Employee

We need to check the docker file code and proceed, it would be helpful if you create a support case for the same which will ensure to get tagged with the right team.

Also, is there any dependency package failures?

Vartika
Databricks Employee
Databricks Employee

Hi @Navneet Sonak​ 

Hope you are doing well.

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Cheers!

workingtogetdbw
New Contributor II

What there has been no answer here!  @Debayan Mukherjee​ @Vartika Nain​ 

So I am running into this same problem as the idea of having to wait 45 minutes for libraries to install is absolutely wild as well as I have done everything outside of working with the docker container.

FROM databricksruntime/standard:9.x
 
# based on these instructions (avoiding firewall issue for some users):
# https://cran.rstudio.com/bin/linux/ubuntu/#secure-apt
RUN apt-get update \
&& DEBIAN_FRONTEND="noninteractive" apt-get install --yes software-properties-common apt-transport-https \
&& gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | sudo apt-key add - \
&& add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' \
&& apt-get update \
&& DEBIAN_FRONTEND="noninteractive" apt-get install --yes \
libssl-dev \
r-base \
r-base-dev \
&& add-apt-repository -r 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' \
&& apt-key del E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 
# UPDATE A SERIES OF PACKAGES
# RUN apt-get update --fix-missing && apt-get install -y ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 libxml2-dev
 
# hwriterPlus is used by Databricks to display output in notebook cells
# Rserve allows Spark to communicate with a local R process to run R code
# shiny is used by Databricks interpreter
RUN R -e "install.packages(c('hwriter', 'TeachingDemos', 'htmltools'))"
RUN R -e "install.packages('https://cran.r-project.org/src/contrib/Archive/hwriterPlus/hwriterPlus_1.0-3.tar.gz', repos=NULL, type='source')"
RUN R -e "install.packages('Rserve', repos='http://rforge.net/', type='source')"
RUN R -e "install.packages('shiny', repos='https://cran.rstudio.com/')"
# Added packages for the project that I am currently working on
RUN R -e "install.packages(c('sparklyr', 'remotes', 'plyr', 'dplyr', 'rlist', 'stringr', 'rlist', 'ggplot2', 'patchwork', 'scales', 'Robyn', 'reticulate'))"
# Install nevergrad Python package
RUN python3 -m pip install nevergrad
RUN R -e "library(reticulate); reticulate::py_config()"
RUN R -e "install.packages('devtools', repos='https://cran.rstudio.com/')"
RUN R -e "remotes::install_github('mlflow/mlflow', subdir = 'mlflow/R/mlflow')"

I went with using the runtime because there is a use case for MLflow I get hit by the stan issues as well as the mlflow issues being installed.

it is very clear that R isn't supported much in DB as there was a resolved issue that never was merged into the main and the last time it was updated was 10 months ago.

@Navneet Sonak​ let me know if you end up solving this with the docker image I would be super grateful

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group