Databricks Community

rshark · ‎03-30-2023

I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize.

2-cell (Python) notebook example:

load_ext rpy2.ipython

%%R 
library(SparkR)
sparkR.session()

Error message from cell 2:

R[write to console]: Spark package found in SPARK_HOME: /databricks/spark
 
Launching java with spark-submit command /databricks/spark/bin/spark-submit   sparkr-shell /tmp/RtmpPjujEO/backend_port19cf5178fd7d 
R[write to console]: Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap,  : 
  JVM is not ready after 10 seconds
 
R[write to console]: In addition: 
R[write to console]: There were 50 or more warnings (use warnings() to see the first 50)
R[write to console]: 
 
 
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap,  : 
  JVM is not ready after 10 seconds
RInterpreterError: Failed to parse and evaluate line 'sparkR.session()'.

For my collaboration use case, it would be more efficient to drop into a Python notebook and perform analysis in native R, but is it possible to utilize SparkR from within a Python notebook in Databricks?

Dooley · ‎03-30-2023

Yes, you can use SparkR in the Databricks notebooks so you can keep your native R code. You can select at the top part of the notebook in the Databricks GUI that the language will be in R so you are not needing to add %%R to every cell.

SparkR You can also import the ipython notebook you are using to Databricks that will convert it to a Databricks notebook. Then set the language to R at the top and you are good to run.

For collaboration, would it be possible that the person gets Databricks restricted access to work on the notebooks with you in the Databricks GUI? Our notebooks allow multiple people to make edits at the same time & you can share notebooks with one another. You can also leave comments for one another to help improve your collaboration.

rshark · ‎04-03-2023

I'm actually interested in explicitly running a notebook in Python, but running R code from within it. I have no problem connecting R to a Spark cluster when the language is set to R. Is there a way to connect R to Spark for the Python notebook use case or is this an edge case that Databricks doesn't support?

Dooley · ‎04-17-2023

The answer I can give you to have this work for you is to call the R notebooks from your Python notebook. Just save each dataframe as a delta table to pass between the languages.

How to call a notebook from another notebook? here is a link