Databricks Community

JefferyReichman · ‎08-20-2023

Databricks Community

New to Databricks, and R User and trying to figure out how to load a hive table via Sparklyr. The path to the file is https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl (right clicking on the file). I tried

data_tbl <- tbl(sc, "https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl") and apparently that isn't correct.

Jeff

JefferyReichman · ‎08-23-2023

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work.

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

View solution in original post

Kumaran · ‎08-20-2023

Hi @JefferyReichman,

When trying to read a Hive table through Sparklyr, you can use the spark_read_table() function. This function reads tables from your cluster's default database or a specific database.

Here's an example of how to read a Hive table in Sparklyr using a specific database:

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks", 
                    username = "client_id",
                    password = "client_secret",
                    tenant_id = "tenant_id",
                    endpoint = "https://westus2.azuredatabricks.net")

# Set the database where the table is located
database_name <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, in_database(database_name, "mydata_etl"))

JefferyReichman · ‎08-23-2023

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work.

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

JefferyReichman · ‎08-22-2023

Thanks - where can I read up on this for getting started - Jeff

Kumaran · ‎08-23-2023

Hi @JefferyReichman,

Not sure that I completely understood your last question about "where I can read up on this for getting started". However, you can start by running this code in the Databricks community edition notebook.

For more details: Link

Databricks Community

How to load data using Sparklyr

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs