Databricks Community

Jeff1 · ‎03-14-2022

Community

I’ve been struggling with utilizing R language in databricks and after reading “Mastering Spark with R,” I believe my initial problems stemmed from not understating the difference between Spark DataFrames and R DataFrames within the databricks environment. Now that I know many R function will only work with R DataFrames I’ve become quite familiar with the collect() function and the copy_to() function to convert back and forth between dataframe types. So my question deals with are there any sort of Rules of Thumb with regards to Spark /R dataframes when using R in databricks. As it seems as though I am converting back and forth a lot.

Jeff

-werners- · ‎03-15-2022

ggplot2 is not included by default I believe. You will have to install it yourself.

https://spark-packages.org/package/SKKU-SKT/ggplot2.SparkR

http://papl-skku.github.io/ggplot2.SparkR/index

As it is a popular package, chances are real it might be included in the future.

View solution in original post

Hubert-Dudek · ‎03-14-2022

As Spark dataframes are handled in distributed way on workers it is better just to use Spark dataframes. Additionally collect is executed on driver and takes whole dataset into memory so it is shouldn't be used in production.

Jeff1 · ‎03-14-2022

That certainly makes sense but I've run into a number of R functions which error out on Spark DataFrames. For example geohashTools and ggplot2 (in particular ggplot2) only work with R DataFrames (as I understand).

-werners- · ‎03-15-2022

ggplot2 is not included by default I believe. You will have to install it yourself.

https://spark-packages.org/package/SKKU-SKT/ggplot2.SparkR

http://papl-skku.github.io/ggplot2.SparkR/index

As it is a popular package, chances are real it might be included in the future.

Databricks Community

Understand Spark DataFrames verse R DataFrames

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟