Hello. I am using R on databricks and using the below approach.
My Spark version:
Single node: i3.2xlarge · On-demand · DBR: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) · us-east-1a, the job takes 1 hour
I install all R packages (including a geospatial package terra) in my notebook and zip the installed R packages so that I don't have to install the packages again and again.
I deploy a job which does following:
1. Get the zip R packages and unzip
2. load the library
3. do stuff
The job takes an hour to complete.
However, when I update the Spark to below, my run times increase exponentially.
Single node: i3.2xlarge · On-demand · DBR: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12) · us-east-1a
I am not a spark expert but why is changing 11.3 to 13.3 increases the run time? Would the ideal solution be that I create the zip packages again but using the 13.3 instead of 11.3?