<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Support for Delta tables multicluster writes in Databricks cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/support-for-delta-tables-multicluster-writes-in-databricks/m-p/19753#M13286</link>
    <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're using Databricks on AWS and we've recently started using Delta tables. &lt;/P&gt;&lt;P&gt;We're using R.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt; java.lang.IllegalStateException: Cannot find the REPL id in Spark local properties.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # The current mode doesn't support transactional writes from different clusters.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # You can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # If this is disabled, writes to a single table must originate from a single cluster.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # Please check &lt;A href="https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq" target="test_blank"&gt;https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq&lt;/A&gt; for more details.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We've tried on runtimes  8.1, 10.4 and 11.1beta . The code only runs fine when setting spark.databricks.delta.multiClusterWrites.enabled to false.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What should we do to run with multicluster writes support?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Radu&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;[1]&lt;/P&gt;&lt;P&gt;&lt;I&gt;       sdf = SparkR::as.DataFrame(df)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        if (tolower(dataset_name) %in% SparkR::tableNames(db_name)) {&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # Append data. No need to specify partitioning, will use what is in the file.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::write.df(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'append',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mergeSchema = TRUE&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        } else {&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # First create the Databricks managed Delta Table&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::saveAsTable(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            tableName = tolower(paste0(db_name, '.', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'append',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mergeSchema = TRUE&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # Overwrite the table info with partitioning information&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::write.df(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'overwrite',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            overwriteSchema = TRUE,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            partitionBy = c('organization')&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        }&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 30 Nov 2022 09:59:51 GMT</pubDate>
    <dc:creator>64883</dc:creator>
    <dc:date>2022-11-30T09:59:51Z</dc:date>
    <item>
      <title>Support for Delta tables multicluster writes in Databricks cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/support-for-delta-tables-multicluster-writes-in-databricks/m-p/19753#M13286</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're using Databricks on AWS and we've recently started using Delta tables. &lt;/P&gt;&lt;P&gt;We're using R.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While the code below[1] works in a notebook, when running it from RStudio on a Databricks cluster we get the following error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt; java.lang.IllegalStateException: Cannot find the REPL id in Spark local properties.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # The current mode doesn't support transactional writes from different clusters.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # You can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # If this is disabled, writes to a single table must originate from a single cluster.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;  # Please check &lt;A href="https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq" target="test_blank"&gt;https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq&lt;/A&gt; for more details.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We've tried on runtimes  8.1, 10.4 and 11.1beta . The code only runs fine when setting spark.databricks.delta.multiClusterWrites.enabled to false.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What should we do to run with multicluster writes support?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Radu&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;[1]&lt;/P&gt;&lt;P&gt;&lt;I&gt;       sdf = SparkR::as.DataFrame(df)&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        if (tolower(dataset_name) %in% SparkR::tableNames(db_name)) {&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # Append data. No need to specify partitioning, will use what is in the file.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::write.df(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'append',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mergeSchema = TRUE&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        } else {&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # First create the Databricks managed Delta Table&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::saveAsTable(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            tableName = tolower(paste0(db_name, '.', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'append',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mergeSchema = TRUE&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          # Overwrite the table info with partitioning information&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          SparkR::write.df(&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            sdf,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            path = tolower(paste0('dbfs:/user/hive/warehouse/', db_name, '.db/', dataset_name)),&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            mode = 'overwrite',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            source = 'delta',&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            overwriteSchema = TRUE,&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;            partitionBy = c('organization')&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;          )&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;        }&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 09:59:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/support-for-delta-tables-multicluster-writes-in-databricks/m-p/19753#M13286</guid>
      <dc:creator>64883</dc:creator>
      <dc:date>2022-11-30T09:59:51Z</dc:date>
    </item>
    <item>
      <title>Re: Support for Delta tables multicluster writes in Databricks cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/support-for-delta-tables-multicluster-writes-in-databricks/m-p/65994#M32981</link>
      <description>&lt;P&gt;Sorry, for being very late here -&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you can not use&amp;nbsp;&amp;nbsp;multi write to false,&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;we can try to split this table into separate tables for each stream.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Apr 2024 10:21:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/support-for-delta-tables-multicluster-writes-in-databricks/m-p/65994#M32981</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2024-04-10T10:21:53Z</dc:date>
    </item>
  </channel>
</rss>

