When installing Notebook-scoped R libraries I don't want to manually specify the custom CRAN mirror each time like this:
install.packages("diffdf", repos="my_custom_cran_url'')
Instead I want to take the custom CRAN mirror URL by default so that I don't have to specify the URL each time:
install.packages("diffdf")
Normally this is done by adjusting the .Rprofile or Rprofile.site files. Unfortunately those files will only have an effect for the RStudio sessions and not for the SparkR sessions in the R Notebooks.
After some try and error I figured out that specifying the default CRAN mirror URL under /databricks/spark/R/lib/SparkR/R/SparkR will work as desired. However I can't update this file automatically via a cluster-scoped init script as at the time of the init script execution the lib/SparkR/R/SparkR doesn't exist already (The path/file is somehow dynamically build at a later time).
Unfortunately I couldn't find any useful information for this particular use case on the internet. Does anyone have an idea?