cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

thomann
New Contributor III

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LimageI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Personal Single Node Cluster. I did not try it out with a remote RStudio (via databricks connect).

This looks like a major incompatibility?

Something similar happens when I open 2 identical notebooks each with the following cell:

library(sparklyr)
library(tidyverse)
sc <- spark_connect(method="databricks")
 
sdf <- sc %>% sdf_sql("SELECT * FROM samples.nyctaxi.trips")
sdf

If you run it in the first notebook you get the desired outcome.

If you then run it on the same cluster (without restarting the cluster) you get the following error:

Error : org.apache.spark.SparkException: Missing Credential Scope. 
	at com.databricks.unity.UCSDriver$Manager.$anonfun$scope$1(UCSDriver.scala:103)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.unity.UCSDriver$Manager.scope(UCSDriver.scala:103)
	at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)
	at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)
	at com.databricks.unity.UnityCredentialScope$.getSAMRegistry(UnityCredentialScope.scala:120)
	at com.databricks.unity.SAMRegistry$.getSAMOpt(SAMRegistry.scala:346)
	at com.databricks.unity.CredentialScopeSQLHelper$.registerPathIfMissing(CredentialScopeSQLHelper.scala:204)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:853)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:774)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.apply(DeltaLog.scala:754)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.forTable(DeltaLog.scala:701)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.$anonfun$forTableWithSnapshot$1(DeltaLog.scala:780)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.withFreshSnapshot(DeltaLog.scala:806)
	at com.databricks.sql.transaction.tahoe.DeltaLog$.forTableWithSnapshot(DeltaLog.scala:780)
	at com.databricks.sql.managedcatalog.SampleTable.readSchema(SampleTables.scala:109)
	at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.$anonfun$getSampleTableMetadata$1(ManagedCatalogSessionCatalog.scala:955)
	at scala.Option.map(Option.scala:230)
	at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.getSampleTableMetadata(M[...]

This happens:

  • Both in 11.3LTS and 12.0 Beta
  • Also if you only read from dbfs: spark_read_delta(sc, 'dbfs:/databricks-datasets/learning-spark-v2/people/people-10m.delta')
  •  Also if you have a multiple task job and run it on a job cluster.

It does not happen in SparkR.

There is already a StackOverflow Question about this: https://stackoverflow.com/questions/74575249/sparklyr-multiple-databricks-notebooks-second-connectio...

5 REPLIES 5

thomann
New Contributor III

Hi @Kaniz Fatma​! Maybe you can point us into the right direction?

Kaniz
Community Manager
Community Manager

Hi @Philipp Thomann​, Sure!

Let me get back to you asap.

Kaniz
Community Manager
Community Manager

Hi @Philipp Thomann​, You must set up the AWS credentials in the RStudio server.

The error about a missing Credential Scope is related to AWS credentials and is likely caused by the fact that the RStudio server cannot access the AWS credentials required for Spark to access S3.

When starting an RStudio server in a Unity Catalog Cluster, the AWS credentials are not automatically propagated to the RStudio server.

To resolve this issue, you must set up the AWS credentials in the RStudio server.

thomann
New Contributor III

Thanks @Kaniz Fatma​ for your answer!

Actually, we have the problem on Azure Databricks. We did not test it on AWS yet. Does that make a difference?

Also - as written in the Question - the problem happens (actually more pressing) directly in the Databricks Notebooks themselves, if a second Databricks Notebook using R is connecting to the same Cluster. This happens both in interactivice clusters as well as when several steps in a Workflow use R Notebooks with Unity Catalog.

Could you give us some pointer (in the documentation, source code ?), what AWS/Azure Credentials you mean and how we should set them up?

Best, Philipp

kunalmishra9
New Contributor III

Have run into this issue as well. Let me know if there was any resolution

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.