I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize.
2-cell (Python) notebook example:
%%R library(SparkR) sparkR.session()
Error message from cell 2:
R[write to console]: Spark package found in SPARK_HOME: /databricks/spark Launching java with spark-submit command /databricks/spark/bin/spark-submit sparkr-shell /tmp/RtmpPjujEO/backend_port19cf5178fd7d R[write to console]: Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : JVM is not ready after 10 seconds R[write to console]: In addition: R[write to console]: There were 50 or more warnings (use warnings() to see the first 50) R[write to console]: Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : JVM is not ready after 10 seconds RInterpreterError: Failed to parse and evaluate line 'sparkR.session()'.
For my collaboration use case, it would be more efficient to drop into a Python notebook and perform analysis in native R, but is it possible to utilize SparkR from within a Python notebook in Databricks?
Yes, you can use SparkR in the Databricks notebooks so you can keep your native R code. You can select at the top part of the notebook in the Databricks GUI that the language will be in R so you are not needing to add %%R to every cell.
You can also import the ipython notebook you are using to Databricks that will convert it to a Databricks notebook. Then set the language to R at the top and you are good to run.
For collaboration, would it be possible that the person gets Databricks restricted access to work on the notebooks with you in the Databricks GUI? Our notebooks allow multiple people to make edits at the same time & you can share notebooks with one another. You can also leave comments for one another to help improve your collaboration.
I'm actually interested in explicitly running a notebook in Python, but running R code from within it. I have no problem connecting R to a Spark cluster when the language is set to R. Is there a way to connect R to Spark for the Python notebook use case or is this an edge case that Databricks doesn't support?
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!