Databricks Community

JefferyReichman · ‎08-20-2023

New to Databricks, and R User and trying to figure out how to load a hive table via Sparklyr. The path to the file is https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl (right clicking on the file). I tried

data_tbl <- tbl(sc, "https://databricks.xxx.xx.gov/#table/xxx_mydata/mydata_etl") and apparently that isn't correct.

Jeff

JefferyReichman · ‎08-23-2023

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work.

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

View solution in original post

Kumaran · ‎08-20-2023

Hi @JefferyReichman,

When trying to read a Hive table through Sparklyr, you can use the spark_read_table() function. This function reads tables from your cluster's default database or a specific database.

Here's an example of how to read a Hive table in Sparklyr using a specific database:

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks", 
                    username = "client_id",
                    password = "client_secret",
                    tenant_id = "tenant_id",
                    endpoint = "https://westus2.azuredatabricks.net")

# Set the database where the table is located
database_name <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, in_database(database_name, "mydata_etl"))

JefferyReichman · ‎08-23-2023

Those set of commands didn't seem to work. However, with a little digging and reading I found this set of command did work.

%r
# Load Sparklyr library
library(sparklyr)

# Connect to the cluster using a service principal
sc <- spark_connect(method = "databricks")

# Set the database where the table is located
tbl_change_db <- "xxx_mydata"

# Use spark_read_table() function to read the table
data_tbl <- spark_read_table(sc, "mydata_etl")

JefferyReichman · ‎08-22-2023

Thanks - where can I read up on this for getting started - Jeff

Kumaran · ‎08-23-2023

Hi @JefferyReichman,

Not sure that I completely understood your last question about "where I can read up on this for getting started". However, you can start by running this code in the Databricks community edition notebook.

For more details: Link

Databricks Community

How to load data using Sparklyr

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon