@Kaniz_Fatma Thank you for your detailed response! I think we would like to use Docker if we can because we are not using RStudio but R directly in the databricks notebooks and workflows. So, anymore information about R and Docker and Databricks would also be useful. Currently, this docker code builds successfully and is archived successfully but is not deploying on Datatbricks.
# syntax=docker/dockerfile:1.2
# Stage 1: Build R environment with Rocker
FROM --platform=linux/amd64 rocker/r-base:latest AS rbuilder
# Install required R packages in the Rocker image
RUN apt-get update && apt-get install -y \
r-cran-dplyr \
r-cran-ggplot2 \
r-cran-tidyr \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Stage 2: Use Databricks image and copy R installation from Rocker
FROM --platform=linux/amd64 databricksruntime/standard:latest
# Copy R binaries and libraries from the Rocker image
COPY --from=rbuilder /usr/lib/R /usr/lib/R
COPY --from=rbuilder /usr/share/R /usr/share/R
COPY --from=rbuilder /etc/R /etc/R
COPY --from=rbuilder /usr/bin/R /usr/bin/R
COPY --from=rbuilder /usr/bin/Rscript /usr/bin/Rscript
# Ensure the R library paths are correctly set
ENV R_HOME=/usr/lib/R
ENV PATH=$PATH:/usr/lib/R/bin
# Copy R packages from the previous stage
COPY --from=rbuilder /usr/lib/R/site-library /usr/local/lib/R/site-library
COPY --from=rbuilder /usr/lib/x86_64-linux-gnu/ /usr/lib/x86_64-linux-gnu/
I have solved my dependency problem with the following code in my notebook, but I am a bit confused why it works because the PROJ_LIB has to be set to /usr/share/proj and then reset in the install of sf and prism to /lib/x86_64-linux-gnu and then the repo for sf has to be https://cran.r-project.org but could be https://packagemanager.rstudio.com/cran/__linux__/focal/latest for prism. I would like to use the second repo as much as possible to install R packages because it is much faster than CRAN.
%r
system('sudo apt-get -y update && apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev')
%sh
ldconfig -p | grep gdal
ldconfig -p | grep geos
ldconfig -p | grep proj
%r
options(HTTPUserAgent = sprintf(
"R/%s R (%s)",
getRversion(),
paste(
getRversion(),
R.version["platform"],
R.version["arch"],
R.version["os"]
)
))
Sys.setenv(PROJ_LIB = "/usr/share/proj")
install.packages('units', lib='/databricks/spark/R/lib/',
repos="https://cran.r-project.org")
install.packages('sf',
configure.args = "--with-proj-lib=/lib/x86_64-linux-gnu --with-proj-include=/usr/include",
lib='/databricks/spark/R/lib/',
repos="https://cran.r-project.org"
)
library(sf, lib.loc='/databricks/spark/R/lib/')
install.packages('prism',
configure.args = "--with-proj-lib=/lib/x86_64-linux-gnu --with-proj-include=/usr/include",
lib='/databricks/spark/R/lib/',
repos = c(CRAN = "https://packagemanager.rstudio.com/cran/__linux__/focal/latest")
)
library(prism, lib.loc='/databricks/spark/R/lib/')
Anyway! Thank you again for answering.