I have a Databricks workspace in GCP and I am using the cluster with the Runtime 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12). I am trying to set the checkpoint directory location using the following command in a notebook:
spark.sparkContext.setCheckpointDir("/FileStore/checkpoint")
The sparkSession used here is the default that is init within the notebook. But I get the following error:
[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `sparkContext` is not supported in Spark
Connect as it depends on the JVM. If you need to use this attribute, do not use Spark
Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-
getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.
It seems in from spark 3.4 we have a spark connect object instead of the the regular sparkSession which does not have the sparkContext attribute. So, as suggested by the error I try to create a spark session using the following and then set the checkpoint directory:
from pyspark.sql import SparkSession
sc = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
sc.sparkContext.setCheckpointDir("/FileStore/checkpoint")
But I get the same exact error? How do I use the sparkContext attribute in this version of spark and DBR?