cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to load data using Sparklyr

JefferyReichman
New Contributor III

Databricks Community

 

New to Databricks, and R User and trying to figure out how to load a hive table via Sparklyr. The path to the file is https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl  (right clicking on the file). I tried

data_tbl <- tbl(sc, "https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl")  and apparently that isn't correct.

Jeff

1 ACCEPTED SOLUTION

Accepted Solutions

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work. 

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

View solution in original post

4 REPLIES 4

Kumaran
Valued Contributor III
Valued Contributor III

Hi @JefferyReichman,

When trying to read a Hive table through Sparklyr, you can use the spark_read_table() function. This function reads tables from your cluster's default database or a specific database.

Here's an example of how to read a Hive table in Sparklyr using a specific database:

 

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks", 
                    username = "client_id",
                    password = "client_secret",
                    tenant_id = "tenant_id",
                    endpoint = "https://westus2.azuredatabricks.net")

# Set the database where the table is located
database_name <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, in_database(database_name, "mydata_etl"))

 

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work. 

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

JefferyReichman
New Contributor III

Thanks - where can I read up on this for getting started - Jeff

Kumaran
Valued Contributor III
Valued Contributor III

Hi @JefferyReichman,

Not sure that I completely understood your last question about "where I can read up on this for getting started". However, you can start by running this code in the Databricks community edition notebook.

For more details: Link