When trying incorporate an R package into my Spark workflow using the spark_apply() funciton in Sparklyr, I get the error:
Error: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
I simplied to an easy test case and still happens with a simple chunk that should run fine:
data %>%
sparklyr::spark_apply(
function(data) {data * 10})
My connection is:
sc <- sparklyr::spark_connect(method = "databricks")
Internet seems to suggest this happens due to conflicts between a package and the curerrent version of Spark, though I haven't seen discussions particuarly applied to sparklyr. I'm running version 1.8.1 of sparklyr, tried loading from both CRAN and github. Using Databricks Runtime 14.3.x-photon-scala2.12. Other info:
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit) Running under:
Ubuntu 22.04.4 LTS
I'm a data scientist, not an engineer, so any advice would be helpful on how to get this running. Thanks.