cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Understand Spark DataFrames verse R DataFrames

Jeff1
Contributor II

Community

I’ve been struggling with utilizing R language in databricks and after reading “Mastering Spark with R,” I believe my initial problems stemmed from not understating the difference between Spark DataFrames and R DataFrames within the databricks environment. Now that I know many R function will only work with R DataFrames I’ve become quite familiar with the collect() function and the copy_to() function to convert back and forth between dataframe types. So my question deals with are there any sort of Rules of Thumb with regards to Spark /R dataframes when using R in databricks. As it seems as though I am converting back and forth a lot.

Jeff

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

ggplot2 is not included by default I believe. You will have to install it yourself.

https://spark-packages.org/package/SKKU-SKT/ggplot2.SparkR

http://papl-skku.github.io/ggplot2.SparkR/index

As it is a popular package, chances are real it might be included in the future.

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

As Spark dataframes are handled in distributed way on workers it is better just to use Spark dataframes. Additionally collect is executed on driver and takes whole dataset into memory so it is shouldn't be used in production.

That certainly makes sense but I've run into a number of R functions which error out on Spark DataFrames. For example geohashTools and ggplot2 (in particular ggplot2) only work with R DataFrames (as I understand).

-werners-
Esteemed Contributor III

ggplot2 is not included by default I believe. You will have to install it yourself.

https://spark-packages.org/package/SKKU-SKT/ggplot2.SparkR

http://papl-skku.github.io/ggplot2.SparkR/index

As it is a popular package, chances are real it might be included in the future.

Kaniz
Community Manager
Community Manager

Hi @Jeff Reichman​ , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.