cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Load GCP data to Databricks using R

dZegpi
New Contributor II

I'm working with Databricks and Google Cloud in the same project. I want to load specific datasets stored in GCP into a R notebook in Databricks. I currently can see the datasets in BigQuery. The problem is that using the sparklyr package, I'm not able to see the datasets that I have in GCP. Instead, I see the name of other datasets I was not aware of. 

This is my R code

 

library(sparklyr)
sc <- spark_connect(method = "databricks")

# List table names in my spark connection
dplyr::src_tbls(sc)
# Can't see the tables sotred in GCP

# Try (and fail) to load the tables
spark_read_table(sc, "my-table-name")

 

How can I access my tables stored in GCP through Databricks notebooks using R?; Is sparklyr the right approach?

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @dZegpi, To access your Google Cloud Platform (GCP) datasets in Databricks Notebooks using R, you can use the sparklyr package. Let’s break down the steps:

  1. Load Required Packages: Load the necessary packages in your R notebook. In a Databricks workspace, you don’t need to install them; they are already included in the Databricks Runtime. Run the following code to load SparkR, sparklyr, and dplyr:
    library(SparkR) 
    library(sparklyr) 
    library(dplyr)

Connect to the Databricks cluster: Use spark_connect to establish a connection to the Databricks cluster.

Specify the connection method as “databricks”:

 

sc <- spark_connect(method = "databricks")

 

Note that if you’re working within a Databricks Notebook, a SparkSession is already established, so you don’t need to call SparkR::sparkR.session.

Query GCP Tables: To access your GCP datasets, you can use SQL queries to bridge SparkR and sparklyr. For example:

  • Use SparkR::sql to query tables created with sparklyr.
  • Use sparklyr::sdf_sql to query tables created with SparkR.
  • Remember that dplyr code gets translated to SQL in memory before execution.

Example: Suppose you want to read a BigQuery table named “my-table-name”. You can do so using the following code:

 

# Read the BigQuery table into a Spark DataFrame 
df <- spark_read_table(sc, "my-table-name")

 

Now, you can work with the df DataFrame in your R notebook.

dZegpi
New Contributor II

Other than library(SparkR), this is the same code I posted. This does not solve my problem.

Kaniz
Community Manager
Community Manager

Our End-of-Year Community Survey is here! Please take a few moments to complete the survey. Your feedback matters!