topic Re: Recommended database when using R in databricks in Data Engineering

Recommended database when using R in databricks

Jeff1 — Wed, 09 Mar 2022 15:57:30 GMT

I'm new to integrating the sparklyr / R interface in databricks. In particular it appears that sparklyr and R commands and functions are dependent upon the type of dataframe one is working with (hive, Spark R etc). Is there a recommend best practice as to which dataframe I should start with while working with R in databricks?

Jeff

Re: Recommended database when using R in databricks

Hubert-Dudek — Wed, 09 Mar 2022 16:00:38 GMT

Recommended is delta format in data lake. Here is code example https://docs.databricks.com/delta/quick-start.html#language-r

Re: Recommended database when using R in databricks

Jeff1 — Wed, 09 Mar 2022 17:26:28 GMT

Ok then as I'm reading through the reference material I'm not finding how to convert a Hive table to the delta format. I'm assuming my initial data is a Hive table as I've had to use tbl() to read in the data. Would I simply us a SQL statement to read in the data as a delta table then write it back out?

Re: Recommended database when using R in databricks

Hubert-Dudek — Wed, 09 Mar 2022 18:17:12 GMT

Hi, if your hive table is registered in metastore yes you can use SQL syntax.

Than is enough to use COPY INTO..

if your table is not registered please map it in metastore

CREATE TABLE IF NOT EXISTS tableName (fields) USING data_format LOCATION (path=)

then you can create another table USING delta format and than copy between tables.

Re: Recommended database when using R in databricks

Jeff1 — Wed, 09 Mar 2022 18:55:45 GMT

@Hubert Dudek , Ok - that's helpful. As I'm reading the databricks documentation it appears when I'm reading in my file using the sparklyr tbl() function in databrick it returns a sparklyr

object ("tbl_spark" "tbl_sql" "tbl_lazy" "tbl ''). So does your previous reply still hold true. Either way based upon you oridginal reploy it woudl be to my benefir to convert the sparklyr object into a delta table - yes. If that's true that's what I'm seeking in the documentation or how to do that.

Re: Recommended database when using R in databricks

Hubert-Dudek — Mon, 18 Apr 2022 09:30:16 GMT

Hi, have you found how to convert it?