I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize.
2-cell (Python) notebook example:
load_ext rpy2.ipython
%%R
library(SparkR)
sparkR.session()
Error message from cell 2:
R[write to console]: Spark package found in SPARK_HOME: /databricks/spark
Launching java with spark-submit command /databricks/spark/bin/spark-submit sparkr-shell /tmp/RtmpPjujEO/backend_port19cf5178fd7d
R[write to console]: Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, :
JVM is not ready after 10 seconds
R[write to console]: In addition:
R[write to console]: There were 50 or more warnings (use warnings() to see the first 50)
R[write to console]:
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, :
JVM is not ready after 10 seconds
RInterpreterError: Failed to parse and evaluate line 'sparkR.session()'.
For my collaboration use case, it would be more efficient to drop into a Python notebook and perform analysis in native R, but is it possible to utilize SparkR from within a Python notebook in Databricks?