I am running into an issue with trying to use a 14.3 cluster with databricks-connect 14.3.
My cluster config:
{
"autoscale": {
"min_workers": 2,
"max_workers": 10
},
"cluster_name": "Developer Cluster",
"spark_version": "14.3.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.databricks.service.server.enabled": "true"
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1
},
"node_type_id": "Standard_DS3_v2",
"driver_node_type_id": "Standard_DS3_v2",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {},
"autotermination_minutes": 60,
"enable_elastic_disk": true,
"cluster_source": "UI",
"init_scripts": [],
"enable_local_disk_encryption": false,
"data_security_mode": "NONE",
"runtime_engine": "STANDARD"
}
Running databricks-connect test I get the following output:
databricks-connect test
* PySpark is installed at /Users/user/projects/github.com/org/repo/.python/repo/lib/python3.10/site-packages/pyspark
* Checking SPARK_HOME
* Checking java version
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode)
WARNING: Java versions >8 are not supported by this SDK
* Testing scala command
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/07/25 12:27:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1721924845141).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.1
/_/
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 21.0.4)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
scala> import com.databricks.service.SparkClientManager
<console>:22: error: object databricks is not a member of package com
import com.databricks.service.SparkClientManager
^
scala> val serverConf = SparkClientManager.getForCurrentSession().getServerSparkConf
<console>:22: error: not found: value SparkClientManager
val serverConf = SparkClientManager.getForCurrentSession().getServerSparkConf
^
scala> val processIsolation = serverConf .get("spark.databricks.pyspark.enableProcessIsolation")
<console>:22: error: not found: value serverConf
val processIsolation = serverConf .get("spark.databricks.pyspark.enableProcessIsolation")
^
scala> if (!processIsolation.toBoolean) {
| spark.range(100).reduce((a,b) => Long.box(a + b))
| } else {
| spark.range(99*100/2).count()
| }
<console>:23: error: not found: value processIsolation
if (!processIsolation.toBoolean) {
^
scala>
|
scala> :quit
Traceback (most recent call last):
File "/Users/user/projects/github.com/org/repo/.python/repo/bin/databricks-connect", line 8, in <module>
sys.exit(main())
File "/Users/user/projects/github.com/org/repo/.python/repo/lib/python3.10/site-packages/pyspark/databricks_connect.py", line 311, in main
test()
File "/Users/user/projects/github.com/org/repo/.python/repo/lib/python3.10/site-packages/pyspark/databricks_connect.py", line 267, in test
raise ValueError("Scala command failed to produce correct result")
ValueError: Scala command failed to produce correct result
Trying to run tests against the cluster tells me that a spark session isn't running. However, I can run
spark.sparkContext.getConf().getAll() in a notebook and successfully get a list of configs.