03-26-2025 01:29 PM
Hello!! Currently I have an R-studio installed on a Dedicated Cluster over Azure Databricks, here are the specs:
I must to make enfasis over the Access mode: Manual and Dedicated to a Group.
Here, we install R-studio using a notebook with the following code:
%sh
if [ -f "/Workspace/Shared/ODBC_teradata/tdodbc1720_17_20_00_11_1_x86_64.deb" ]
then
sudo dpkg -i "/Workspace/Shared/ODBC_teradata/tdodbc1720_17_20_00_11_1_x86_64.deb"
else
echo "El archivo /Workspace/Shared/ODBC_teradata/tdodbc1720_17_20_00_11_1_x86_64.deb no existe"
fi
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libssl1.1_1.1.1l-1ubuntu1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libc6-i386_2.35-0ubuntu3.9_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/lib32gcc-s1_12.3.0-1ubuntu1~22.04_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/lib32stdc++6_12.3.0-1ubuntu1~22.04_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libllvm14_14.0.0-1ubuntu1.1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libclang-common-14-dev_14.0.0-1ubuntu1.1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libgc1_8.0.6-1.1build1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libobjc4_12.3.0-1ubuntu1~22.04_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libobjc-11-dev_11.4.0-1ubuntu1~22.04_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libclang1-14_14.0.0-1ubuntu1.1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libclang-14-dev_14.0.0-1ubuntu1.1_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/bibliotecas ubuntu jammy/libclang-dev_14.0-55~exp2_amd64.deb'
sudo dpkg -i '/Workspace/Shared/RStudio_instalacion/rstudio-server-2024.09.0-375-amd64.deb'after the installing process using the notebook asociated with the cluster, we have R-studio, and Works fine, but after almost one day or two days we are getting the following error:
How to fix this first issue?
Also, by the other hand, When I run :
sc<-spark_connect(method = "databricks")I getting following error:
An error occurred while the 'sparklyr' package was updating the RStudio Connections pane:
Error in spark_web.spark_gateway_connection(scon): spark_web is not available while connecting through an sparklyr gateway
If necessary, these warnings can be squelched by setting `options(rstudio.connectionObserver.errorsSuppressed = TRUE)`.
Warning messages:
1: In file.create(to[okay]) :
cannot create file '/usr/local/lib/R/site-library/sparklyr/java//sparklyr-2.2-2.11.jar', reason 'Permission denied'
2: In file.create(to[okay]) :
cannot create file '/usr/local/lib/R/site-library/sparklyr/java//sparklyr-2.1-2.11.jar', reason 'Permission denied'How to fix this second issue?
I really appretiate any help!
Thanks!!
03-27-2025 12:36 PM
Hello! It's me again, I'm also getting the following error: after testing a connection to databricks using sparklyr:
Error:
! java.lang.IllegalStateException: No Unity API token found in Unity Scope
Run `sparklyr::spark_last_error()` to see the full Spark error (multiple lines)
To use the previous style of error message set `options("sparklyr.simple.errors" = TRUE)`
Run `rlang::last_trace()` to see where the error occurred.
> sparklyr::spark_last_error()
java.lang.IllegalStateException: No Unity API token found in Unity Scope
at com.databricks.unity.UnityCatalogClientHelper$.$anonfun$getToken$2(UnityCatalogClientHelper.scala:47)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.unity.UnityCatalogClientHelper$.getToken(UnityCatalogClientHelper.scala:47)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getCatalogProto$1(ManagedCatalogClientImpl.scala:546)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$2(ManagedCatalogClientImpl.scala:6872)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:6871)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:37)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:35)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:216)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:6852)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getCatalogProto(ManagedCatalogClientImpl.scala:531)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getCatalog(ManagedCatalogClientImpl.scala:522)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.$anonfun$getCatalogMetadata$5(ManagedCatalogCommon.scala:440)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.getCatalogMetadata(ManagedCatalogCommon.scala:438)
at com.databricks.sql.managedcatalog.NonPermissionEnforcingManagedCatalog.super$getCatalogMetadata(NonPermissionEnforcingManagedCatalog.scala:178)
at com.databricks.sql.managedcatalog.NonPermissionEnforcingManagedCatalog.$anonfun$getCatalogMetadata$1(NonPermissionEnforcingManagedCatalog.scala:178)
at com.databricks.sql.managedcatalog.NonPermissionEnforcingManagedCatalog.withDbInternalCatalogErrorHandling(NonPermissionEnforcingManagedCatalog.scala:220)
at com.databricks.sql.managedcatalog.NonPermissionEnforcingManagedCatalog.getCatalogMetadata(NonPermissionEnforcingManagedCatalog.scala:178)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.catalogExists(ManagedCatalogCommon.scala:374)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$catalogExists$1(ProfiledManagedCatalog.scala:83)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:1179)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.$anonfun$profile$1(ProfiledManagedCatalog.scala:63)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.profile(ProfiledManagedCatalog.scala:62)
at com.databricks.sql.managedcatalog.ProfiledManagedCatalog.catalogExists(ProfiledManagedCatalog.scala:83)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.catalogExists(ManagedCatalogSessionCatalog.scala:499)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.requireCtExists(ManagedCatalogSessionCatalog.scala:388)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.setCurrentCatalog(ManagedCatalogSessionCatalog.scala:473)
at com.databricks.sql.DatabricksCatalogManager.setCurrentCatalog(DatabricksCatalogManager.scala:132)
at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$1(SetCatalogAndNamespaceExec.scala:35)
at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$1$adapted(SetCatalogAndNamespaceExec.scala:35)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.run(SetCatalogAndNamespaceExec.scala:35)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$2(V2CommandExec.scala:48)
at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:189)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$1(V2CommandExec.scala:48)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:47)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:45)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:56)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$5(QueryExecution.scala:425)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:425)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:194)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:425)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$10(SQLExecution.scala:475)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:813)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:334)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:205)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:750)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:421)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1219)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:417)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:355)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:414)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:388)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:505)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:505)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:379)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:375)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:481)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:388)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:436)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:388)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:314)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:311)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:343)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:131)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1217)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1217)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:122)
at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:989)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1210)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:973)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:1012)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:1045)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sparklyr.Invoke.invoke(invoke.scala:161)
at sparklyr.StreamHandler.handleMethodCall(stream.scala:141)
at sparklyr.StreamHandler.read(stream.scala:62)
at sparklyr.BackendHandler.$anonfun$channelRead0$1(handler.scala:60)
at scala.util.control.Breaks.breakable(Breaks.scala:42)
at sparklyr.BackendHandler.channelRead0(handler.scala:41)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)Any help will be appreciated!
a week ago
You’re seeing two key issues with your RStudio Server on Azure Databricks:
RStudio stops working after 1–2 days.
You get permission errors using sparklyr and can’t update the Connections pane.
Let’s address each:
This often happens due to how Databricks clusters work:
Clusters are Ephemeral: When a cluster is terminated or restarted (either by policy, idle timeout, or an admin), all software and changes not baked into the cluster’s custom image are lost. Your manual RStudio installation via notebooks is not persistent.
Manual Installs are Non-Persistent: When the cluster spins down, the next startup will run on a fresh VM, losing your RStudio install.
How to Fix
You need to make RStudio part of the cluster’s initialization so it’s always present:
Use an Init Script:
Place a shell script (like your install commands) on DBFS or workspace, then configure your cluster to run this at startup.
This ensures RStudio and its dependencies are re-installed every time the cluster starts, so it’s always present.
Example Init Script Steps:
Move your install commands into a .sh script file.
Place the script in DBFS (e.g., /dbfs/FileStore/scripts/rstudio-install.sh).
In the cluster configuration, set the script as a cluster init script.
Reference:
Databricks Init Scripts:
https://docs.databricks.com/en/clusters/init-scripts.html
You see:
cannot create file ‘/usr/local/lib/R/site-library/sparklyr/java//sparklyr-2.2-2.11.jar’, reason 'Permission denied'
This is due to how you’re installing R packages and user permissions:
Default Library Path is Root-Owned: On Databricks, /usr/local/lib/R/site-library is typically owned by root, but RStudio users run as a non-root user.
sparklyr Tries to Write .jar Files: When you load sparklyr, it tries to write required Java/JAR files to its library path and fails due to lack of permission.
Solution is to Use User Library Paths: Install R packages in a user writeable location and run R sessions with this as the library path.
How to Fix
Change R Library Path:
In your R session or RStudio, set the library path to a writeable directory. For example:
# At the top of your R scripts or .Rprofile
.libPaths("/databricks/driver/R/my-user-lib")
Make sure this directory exists and is writeable.
Alternatively, install R packages using:
install.packages("sparklyr", lib="/databricks/driver/R/my-user-lib")
Do Not Install R Packages as Root via shell or sudo unless you are sure all users run as root (not recommended on Databricks).
Reference:
This is expected when using sparklyr with Databricks, as the "Connections" feature in RStudio might not update correctly under gateway/proxy setups. As the warning suggests, you can suppress these with:
options(rstudio.connectionObserver.errorsSuppressed = TRUE)
But this doesn't affect actual Spark functionality—it only impacts the UI pane.
| Issue | Likely Cause | Fix |
|---|---|---|
| RStudio stops working after 1–2 days | Cluster is re-imaged/reset; manual install is lost | Use a cluster init script for RStudio installation |
| sparklyr “Permission denied” error | R is trying to write .jar to a root-owned directory | Set R package library path to a user writeable location |
| sparklyr connection pane warning | UI limitations with sparklyr in Databricks | Suppress warning with options(rstudio.connectionObserver.errorsSuppressed = TRUE) as needed |
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now