I recently upgraded my Databricks Connect version to 15.4 and got set up for Serverless, but ran into the following error when I ran the standard code to enable Arrow on Pyspark:
>>> spark.conf.set(key='spark.sql.execution.arrow.pyspark.enabled', value='true')
pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.sql.execution.arrow.pyspark.enabled is not available. SQLSTATE: 42K0I
JVM stacktrace:
org.apache.spark.sql.AnalysisException
at com.databricks.sql.connect.SparkConnectConfig$.assertConfigAllowed(SparkConnectConfig.scala:219)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler$RuntimeConfigWrapper.set(SparkConnectConfigHandler.scala:88)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:230)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:228)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:228)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:201)
at org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:123)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:805)
at grpc_shaded.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.$anonfun$onHalfClose$1(AuthenticationInterceptor.scala:310)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$3(RequestContext.scala:286)
at com.databricks.spark.connect.service.RequestContext$.com$databricks$spark$connect$service$RequestContext$$withLocalProperties(RequestContext.scala:473)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$2(RequestContext.scala:286)
at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:29)
at com.databricks.spark.util.UniverseAttributionContextWrapper.withValue(AttributionContextUtils.scala:228)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$1(RequestContext.scala:285)
at com.databricks.spark.connect.service.RequestContext.withContext(RequestContext.scala:298)
at com.databricks.spark.connect.service.RequestContext.runWith(RequestContext.scala:278)
at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.onHalfClose(AuthenticationInterceptor.scala:310)
at grpc_shaded.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at grpc_shaded.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at grpc_shaded.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at grpc_shaded.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:351)
at grpc_shaded.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861)
at grpc_shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at grpc_shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:840)
When I disabled serverless and connected to a standard cluster, there was no error. So the ask is to either
1) fix enabling Arrow on PySpark
or
2) fix the error when the conf is specified, but possibly explicitly warn that it's not possible on Serverless Compute if this is intended. If not intended, this should be added to the linked documentation in both locations.
For now, I've wrapped that bit of code in a try/except to gracefully handle the error no matter what I'm connecting to.