If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Personal Single Node Cluster. I did not try it out with a remote RStudio (via databricks connect).
This looks like a major incompatibility?
Something similar happens when I open 2 identical notebooks each with the following cell:
library(sparklyr)
library(tidyverse)
sc <- spark_connect(method="databricks")
sdf <- sc %>% sdf_sql("SELECT * FROM samples.nyctaxi.trips")
sdf
If you run it in the first notebook you get the desired outcome.
If you then run it on the same cluster (without restarting the cluster) you get the following error:
Error : org.apache.spark.SparkException: Missing Credential Scope.
at com.databricks.unity.UCSDriver$Manager.$anonfun$scope$1(UCSDriver.scala:103)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.unity.UCSDriver$Manager.scope(UCSDriver.scala:103)
at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)
at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)
at com.databricks.unity.UnityCredentialScope$.getSAMRegistry(UnityCredentialScope.scala:120)
at com.databricks.unity.SAMRegistry$.getSAMOpt(SAMRegistry.scala:346)
at com.databricks.unity.CredentialScopeSQLHelper$.registerPathIfMissing(CredentialScopeSQLHelper.scala:204)
at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:853)
at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:774)
at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:754)
at com.databricks.sql.transaction.tahoe.DeltaLog$.forTable(DeltaLog.scala:701)
at com.databricks.sql.transaction.tahoe.DeltaLog$.$anonfun$forTableWithSnapshot$1(DeltaLog.scala:780)
at com.databricks.sql.transaction.tahoe.DeltaLog$.withFreshSnapshot(DeltaLog.scala:806)
at com.databricks.sql.transaction.tahoe.DeltaLog$.forTableWithSnapshot(DeltaLog.scala:780)
at com.databricks.sql.managedcatalog.SampleTable.readSchema(SampleTables.scala:109)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.$anonfun$getSampleTableMetadata$1(ManagedCatalogSessionCatalog.scala:955)
at scala.Option.map(Option.scala:230)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.getSampleTableMetadata(M[...]
This happens:
- Both in 11.3LTS and 12.0 Beta
- Also if you only read from dbfs: spark_read_delta(sc, 'dbfs:/databricks-datasets/learning-spark-v2/people/people-10m.delta')
- Also if you have a multiple task job and run it on a job cluster.
It does not happen in SparkR.
There is already a StackOverflow Question about this: https://stackoverflow.com/questions/74575249/sparklyr-multiple-databricks-notebooks-second-connectio...