cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to convert R dataframe to spark dataframe

Paddy_chu
New Contributor III

Hi All, 

Does anyone knows how to convert R dataframe to spark dataframe to Pandas dataframe? I wanted to get a Pandas dataframe ultimately but I guess I need to convert to spark first. I've been using this sparklyr library but my code did not work. This is the code I used in my R cell:

 

%r

library(sparklyr)
library(SparkR)

sc = spark_connect(method = "databricks")
matched_rdf = psm_tbl %>% select(c(code_treat, code_control)) %>% data.frame()
# write the R dataframe to spark
matched_spark = copy_to(sc, matched_rdf, overwrite = TRUE)
 
I suppose matched_spark is spark dataframe already and on the next cell I write:
 
select * from matched_spark, but there's an error saids "matched_spark" object not found. 
 
Appreciate if anyone could help!
2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @Paddy_chu,

Here's an updated version of the R code:

%r

 

library(sparklyr)

library(SparkR)

 

sc <- spark_connect(method = "databricks")

matched_rdf <- psm_tbl %>% select(c(code_treat, code_control)) %>% data.frame()

 

# Write the R dataframe to Spark

matched_spark <- copy_to(sc, matched_rdf, overwrite = TRUE)

 

# Register the Spark DataFrame as a temporary view to query it using SQL

spark.sql("CREATE OR REPLACE TEMP VIEW matched_view AS SELECT * FROM matched_spark")

Thank you for your response, I just tried this line, it did not work:
spark.sql("CREATE OR REPLACE TEMP VIEW matched_view AS SELECT * FROM matched_spark")

it gives me an error on my notebook saids spark.sql can't be found, by the way, I'm writing this in R cell.

However, the following syntax works:

matched_spark %>% sparklyr::sdf_register("matched_view")
 
and then I can use SQL on the next cell to work with matched_view