cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

FeatureEngineeringClient and R

athos
New Contributor

Hi! I'm trying to find a way to create a feature table from R and reticulate

Is it possible? Currently I'm not been able to make a pyspark dataframe to be passed from R to the create_table() function.

The code I'm trying to make it work follows:

 

 

 

install.packages("reticulate")
library(reticulate)
os <- import("os")
use_python(os$sys$executable)

library(tidyverse)
library(sparklyr)
# Connect to Spark
spark <- spark_connect(method = "databricks")

fs <- import("databricks.feature_engineering")
fe <- fs$FeatureEngineeringClient()

mtcars_id <- mtcars %>% rownames_to_column("car_id")
mtcars_sdf <- sdf_copy_to(spark, mtcars_id, overwrite = TRUE)
mtcars_sdf <- spark_dataframe(mtcars_sdf)

fe$create_table(
    name="databricks_asn.default.mtcars",
    primary_keys=c("car_id"),
    df=mtcars_sdf,
    description="MTCARS do R"
)

 

 

  

1 REPLY 1

BigRoux
Databricks Employee
Databricks Employee
Using the provided CONTEXT, it can be concluded that:
  • Creating Databricks Feature Tables using the create_table() function is well-documented for use with PySpark DataFrames. However, passing a PySpark DataFrame generated in R using sparklyr to the create_table() function via reticulate is not directly documented or supported.
  • The primary challenge is compatibility between the SparkR or sparklyr DataFrame and the PySpark DataFrame expected by the Databricks Feature Store API. This process is not explicitly described in the available documentation.
  • To work around this limitation, consider creating the feature table directly within PySpark after exporting the relevant data from R. Another approach is to save the DataFrame from R using the Delta table format and load it into a PySpark DataFrame in Python before invoking the create_table() function in the Feature Store API.
Hope this helps, Lou.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now