Hello,
We're using Databricks on AWS and we've recently started using Delta tables.
We're using R.
While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error:
java.lang.IllegalStateException: Cannot find the REPL id in Spark local properties.
# The current mode doesn't support transactional writes from different clusters.
# You can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'.
# If this is disabled, writes to a single table must originate from a single cluster.
# Please check https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq for more details.
We've tried on runtimes 8.1, 10.4 and 11.1beta . The code only runs fine when setting spark.databricks.delta.multiClusterWrites.enabled to false.
What should we do to run with multicluster writes support?
Thank you,
Radu
[1]
sdf = SparkR::as.DataFrame(df)
if (tolower(dataset_name) %in% SparkR::tableNames(db_name)) {
# Append data. No need to specify partitioning, will use what is in the file.
SparkR::write.df(
sdf,
path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),
mode = 'append',
source = 'delta',
mergeSchema = TRUE
)
} else {
# First create the Databricks managed Delta Table
SparkR::saveAsTable(
sdf,
tableName = tolower(paste0(db_name, '.', dataset_name)),
mode = 'append',
source = 'delta',
mergeSchema = TRUE
)
# Overwrite the table info with partitioning information
SparkR::write.df(
sdf,
path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),
mode = 'overwrite',
source = 'delta',
overwriteSchema = TRUE,
partitionBy = c('organization')
)
}