cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to use R in databricks

JCamiloCS
New Contributor

Hello everyone.

I am a new user of databricks, they implemented it in the company where I work. I am a business analyst and I know something about R, not much either, when I saw that databricks could use R I was very excited because I thought that the knowledge I had in R, although basic, could support me to start in databricks. Unfortunately, the emotion soon turned into frustration, disappointment and helplessness. I have read a couple of articles to run R on databricks but the attempts I make result in error after error, it seems that nothing I know about Rbase, tidyverse, ggplot2 will work in databricks. That's why I go to the R experts in databricks because I'm already tired of experimenting. If you can support me with resources where I can learn how to use R in databricks, I would greatly appreciate it, preferably in Spanish, but other languages are fine.

I hope that those who learned in R understand my emotions a little and forgive the length of the post, I don't want it to feel like a complaint.

It should be noted that I do not know the Python language, so I was hoping to find a familiar environment so that I would not have to start learning another language and delay my startup process with databrick.

Any help would be appreciated. Thanks in advance

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @JCamiloCS , Certainly! I apologize for the oversight. Let’s dive into using R in Databricks.

 

Databricks is a powerful platform for big data analytics and machine learning, and it integrates seamlessly with Apache Spark™. 

 

As a business analyst, leveraging your existing knowledge of R can be incredibly valuable. 

 

Here are some steps to get you started:

 

Create a Databricks Account:

  • If you haven’t already, sign up for a Databricks account. You can use the free Community Edition or explore the paid options.

Understanding Databricks Notebooks:

  • Databricks provides interactive notebooks where you can write and execute code. These notebooks support multiple languages, including R.
  • Create a new notebook by clicking on “Workspace” > “Create” > “Notebook.”

Select R as the Language:

  • When creating a new notebook, choose “R” as the language.
  • You’ll see a cell where you can write and execute R code.

Install Required Libraries:

  • Databricks provides pre-installed libraries, but you can also install additional R packages.
  • To install a package, use the following command in a notebook cell:

Load Data:

  • You can load data into Databricks from various sources (CSV, Parquet, etc.). Use the spark_read_csv() function from the sparklyr package to read data.
  • Example:

Explore Data:

  • Use familiar R functions like head(), summary(), and str() to explore your data.

Data Manipulation with dplyr:

  • sparklyr provides a dplyr-like interface for working with Spark DataFrames.
  • Use functions like filter(), select(), and mutate() to manipulate data.

Visualizations with ggplot2:

  • You can create visualizations using the ggplot2 package.

Run Spark Jobs:

  • Databricks allows you to run Spark jobs using R.
  • Execute your code in a notebook cell, and Databricks will distribute the computation across the Spark cluster.

Learn from Examples:

  • Explore Databricks’ sample notebooks and tutorials. They cover a wide range of topics and use cases.
  • Modify and experiment with these examples to learn how R works in Databricks.

Remember that Databricks also supports Python, but if you prefer R, stick with it! 

 

Feel free to ask if you have any specific questions or encounter issues. 

 

Happy exploring! 🚀

 

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @JCamiloCS , Certainly! I apologize for the oversight. Let’s dive into using R in Databricks.

 

Databricks is a powerful platform for big data analytics and machine learning, and it integrates seamlessly with Apache Spark™. 

 

As a business analyst, leveraging your existing knowledge of R can be incredibly valuable. 

 

Here are some steps to get you started:

 

Create a Databricks Account:

  • If you haven’t already, sign up for a Databricks account. You can use the free Community Edition or explore the paid options.

Understanding Databricks Notebooks:

  • Databricks provides interactive notebooks where you can write and execute code. These notebooks support multiple languages, including R.
  • Create a new notebook by clicking on “Workspace” > “Create” > “Notebook.”

Select R as the Language:

  • When creating a new notebook, choose “R” as the language.
  • You’ll see a cell where you can write and execute R code.

Install Required Libraries:

  • Databricks provides pre-installed libraries, but you can also install additional R packages.
  • To install a package, use the following command in a notebook cell:

Load Data:

  • You can load data into Databricks from various sources (CSV, Parquet, etc.). Use the spark_read_csv() function from the sparklyr package to read data.
  • Example:

Explore Data:

  • Use familiar R functions like head(), summary(), and str() to explore your data.

Data Manipulation with dplyr:

  • sparklyr provides a dplyr-like interface for working with Spark DataFrames.
  • Use functions like filter(), select(), and mutate() to manipulate data.

Visualizations with ggplot2:

  • You can create visualizations using the ggplot2 package.

Run Spark Jobs:

  • Databricks allows you to run Spark jobs using R.
  • Execute your code in a notebook cell, and Databricks will distribute the computation across the Spark cluster.

Learn from Examples:

  • Explore Databricks’ sample notebooks and tutorials. They cover a wide range of topics and use cases.
  • Modify and experiment with these examples to learn how R works in Databricks.

Remember that Databricks also supports Python, but if you prefer R, stick with it! 

 

Feel free to ask if you have any specific questions or encounter issues. 

 

Happy exploring! 🚀

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!