01-05-2025 02:25 PM
I recently upgraded my Databricks Connect version to 15.4 and got set up for Serverless, but ran into the following error when I ran the standard code to enable Arrow on Pyspark:
>>> spark.conf.set(key='spark.sql.execution.arrow.pyspark.enabled', value='true')
pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.sql.execution.arrow.pyspark.enabled is not available. SQLSTATE: 42K0I
JVM stacktrace:
org.apache.spark.sql.AnalysisException
at com.databricks.sql.connect.SparkConnectConfig$.assertConfigAllowed(SparkConnectConfig.scala:219)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler$RuntimeConfigWrapper.set(SparkConnectConfigHandler.scala:88)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:230)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:228)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:228)
at org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:201)
at org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:123)
at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:805)
at grpc_shaded.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.$anonfun$onHalfClose$1(AuthenticationInterceptor.scala:310)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$3(RequestContext.scala:286)
at com.databricks.spark.connect.service.RequestContext$.com$databricks$spark$connect$service$RequestContext$$withLocalProperties(RequestContext.scala:473)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$2(RequestContext.scala:286)
at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:29)
at com.databricks.spark.util.UniverseAttributionContextWrapper.withValue(AttributionContextUtils.scala:228)
at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$1(RequestContext.scala:285)
at com.databricks.spark.connect.service.RequestContext.withContext(RequestContext.scala:298)
at com.databricks.spark.connect.service.RequestContext.runWith(RequestContext.scala:278)
at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.onHalfClose(AuthenticationInterceptor.scala:310)
at grpc_shaded.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at grpc_shaded.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at grpc_shaded.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at grpc_shaded.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:351)
at grpc_shaded.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861)
at grpc_shaded.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at grpc_shaded.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:840)
When I disabled serverless and connected to a standard cluster, there was no error. So the ask is to either
1) fix enabling Arrow on PySpark
or
2) fix the error when the conf is specified, but possibly explicitly warn that it's not possible on Serverless Compute if this is intended. If not intended, this should be added to the linked documentation in both locations.
For now, I've wrapped that bit of code in a try/except to gracefully handle the error no matter what I'm connecting to.
01-05-2025 02:48 PM
Serverless is currently limited to only few spark confs as mentioned in docs:
Most Apache Spark compute configurations. For a list of supported configurations, see Supported Spark configuration parameters.
Reference doc: https://docs.databricks.com/en/compute/serverless/limitations.html
01-05-2025 02:48 PM
Serverless is currently limited to only few spark confs as mentioned in docs:
Most Apache Spark compute configurations. For a list of supported configurations, see Supported Spark configuration parameters.
Reference doc: https://docs.databricks.com/en/compute/serverless/limitations.html
01-06-2025 01:23 PM
Gotcha, thanks! Missed it in the limitations.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now