cancel
Showing results for 
Search instead for 
Did you mean: 

Error when calling SparkR from within a Python notebook

rshark
New Contributor II

I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize.

2-cell (Python) notebook example:

load_ext rpy2.ipython
%%R 
library(SparkR)
sparkR.session()

Error message from cell 2:

R[write to console]: Spark package found in SPARK_HOME: /databricks/spark
 
Launching java with spark-submit command /databricks/spark/bin/spark-submit   sparkr-shell /tmp/RtmpPjujEO/backend_port19cf5178fd7d 
R[write to console]: Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap,  : 
  JVM is not ready after 10 seconds
 
R[write to console]: In addition: 
R[write to console]: There were 50 or more warnings (use warnings() to see the first 50)
R[write to console]: 
 
 
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap,  : 
  JVM is not ready after 10 seconds
RInterpreterError: Failed to parse and evaluate line 'sparkR.session()'.

For my collaboration use case, it would be more efficient to drop into a Python notebook and perform analysis in native R, but is it possible to utilize SparkR from within a Python notebook in Databricks?

3 REPLIES 3

Dooley
Valued Contributor

Yes, you can use SparkR in the Databricks notebooks so you can keep your native R code. You can select at the top part of the notebook in the Databricks GUI that the language will be in R so you are not needing to add %%R to every cell.

SparkRYou can also import the ipython notebook you are using to Databricks that will convert it to a Databricks notebook. Then set the language to R at the top and you are good to run.

For collaboration, would it be possible that the person gets Databricks restricted access to work on the notebooks with you in the Databricks GUI? Our notebooks allow multiple people to make edits at the same time & you can share notebooks with one another. You can also leave comments for one another to help improve your collaboration.

rshark
New Contributor II

I'm actually interested in explicitly running a notebook in Python, but running R code from within it. I have no problem connecting R to a Spark cluster when the language is set to R. Is there a way to connect R to Spark for the Python notebook use case or is this an edge case that Databricks doesn't support?

Dooley
Valued Contributor

The answer I can give you to have this work for you is to call the R notebooks from your Python notebook. Just save each dataframe as a delta table to pass between the languages.

How to call a notebook from another notebook? here is a link

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.