02-21-2024 11:15 PM - edited 02-21-2024 11:16 PM
Hi,
I'm using Databricks Connect to run Scala code from IntelliJ on a Databricks single node cluster.
Even with the simplest code, I'm experiencing this error:
org.apache.spark.SparkException: grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2957078008500330718, local class serialVersionUID = 7842785351289879144
Creating and processing dataframes works, but as soon as I try to do the simplest processing it fails.
Minimal code example to reproduce:
val df = spark.read.table("samples.nyctaxi.trips")
import spark.implicits._
df
.map(_.getAs[Int]("dropoff_zip"))
.show(10)
Happens with both 13.3 LTS and 14.3 LTS. Databricks Connect dependency has the same version as the cluster, Scala is 2.12.15, JDK 8 Azul.
Same code works fine in a notebook.
02-22-2024 04:44 AM
Forgot to add that I included the code as described in the docs:
val sourceLocation = getClass.getProtectionDomain.getCodeSource.getLocation.toURI
DatabricksSession.builder()
.clusterId(clusterId)
.addCompiledArtifacts(sourceLocation)
.getOrCreate()
02-22-2024 05:34 AM
can you check your build.sbt?
https://docs.databricks.com/en/dev-tools/databricks-connect/scala/index.html
Also, in your session builder I do not see the remote() or sdkconfig() part.
Can you go through the docs and check everything?
It should work, checked myself last week.
02-22-2024 06:20 AM
I left that out, my connection looks like this:
val spark: SparkSession =
DatabricksSession.builder()
.host("xxx")
.token("xxx")
.clusterId("xxx")
.addCompiledArtifacts(sourceLocation) // tried with and without this
.getOrCreate()
02-22-2024 05:39 AM
I notice you call the addcompiledartifacts API, that is used for UDFs packed in a jar that is installed on the cluster.
https://docs.databricks.com/en/dev-tools/databricks-connect/scala/udf.htmlIs that the case for you? It seems you only want to run the default example.
02-22-2024 06:25 AM - edited 02-22-2024 06:29 AM
The documentation states: "The same mechanism described in the preceding section for UDFs also applies to typed Dataset APIs.".
My
map(_.getAs[Int]("dropoff_zip"))
is like a UDF, so that's why I'm adding the compiled source.
(I also had to do it in a similar way when trying Spark Connect against a Spark 3.5.0 cluster, and it ran successfully).
By the way, as soon as I leave out the .map(), it runs, so the error has to do with user functions / Dataset API.
02-22-2024 07:14 AM
I see, so it can't be the connection.
does importing udf help? Just guessing here (after reading the docs for typed dataset api)
02-22-2024 08:04 AM
Using a proper UDF does indeed work:
val myUdf = udf { row: Int =>
row * 5
}
df.withColumn("dropoff_zip_processed", myUdf($"dropoff_zip"))
It's just the Dataset API that doesn't work.
02-22-2024 11:14 PM
So this is clearly a bug in Databricks Connect. I'm not on a support plan, so not sure how to report a bug on this...
02-23-2024 12:03 AM
I also tried on a shared cluster, and the error message is pretty clear
org.sparkproject.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: User defined code is not yet supported.
02-23-2024 12:29 AM
that is pretty clear indeed.
But according to the docs it should be supported.
Since scala support only went GA on 1st of feb 2024, chances are we are talking about a bug here.
Are you sure you added the correct databricks connect jar? (14.3)
02-23-2024 02:16 AM - edited 02-23-2024 02:18 AM
Yes, I tried both 14.3.0 and 14.3.1.
I'm also encountering the same (or very similar) error when firing against a local Spark Connect cluster. When I replace databricks-connect with spark-connect, it works.
I sent a bug report to help@databricks.com.
02-23-2024 03:40 AM
nice find.
definitely a bug if it works in spark-connect.
02-23-2024 07:38 AM
I just hope Databricks will pay attention to it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group