cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Recommended database when using R in databricks

Jeff1
Contributor II

I'm new to integrating the sparklyr / R interface in databricks. In particular it appears that sparklyr and R commands and functions are dependent upon the type of dataframe one is working with (hive, Spark R etc). Is there a recommend best practice as to which dataframe I should start with while working with R in databricks?

Jeff

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

Recommended is delta format in data lake. Here is code example https://docs.databricks.com/delta/quick-start.html#language-r

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

Recommended is delta format in data lake. Here is code example https://docs.databricks.com/delta/quick-start.html#language-r

Ok then as I'm reading through the reference material I'm not finding how to convert a Hive table to the delta format. I'm assuming my initial data is a Hive table as I've had to use tbl() to read in the data. Would I simply us a SQL statement to read in the data as a delta table then write it back out?

Hubert-Dudek
Esteemed Contributor III

Hi, if your hive table is registered in metastore yes you can use SQL syntax.

Than is enough to use COPY INTO..

if your table is not registered please map it in metastore

CREATE TABLE IF NOT EXISTS tableName (fields) USING data_format LOCATION (path=)

then you can create another table USING delta format and than copy between tables.

Jeff1
Contributor II

@Hubert Dudekโ€‹ , Ok - that's helpful. As I'm reading the databricks documentation it appears when I'm reading in my file using the sparklyr tbl() function in databrick it returns a sparklyr

object ("tbl_spark" "tbl_sql" "tbl_lazy" "tbl ''). So does your previous reply still hold true. Either way based upon you oridginal reploy it woudl be to my benefir to convert the sparklyr object into a delta table - yes. If that's true that's what I'm seeking in the documentation or how to do that.

Hubert-Dudek
Esteemed Contributor III

Hi, have you found how to convert it?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group