Not able to run Pipeline Model load functions unity catalog cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2025 07:54 AM
ISSUE -- Not able to run PipelineModel load functions unity catalog cluster
ERROR --[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `sparkContext` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.
ANALYSIS --
In Databricks, the difference between spark session type:
<class 'pyspark.sql.connect.session.SparkSession'> (used in Unity Catalog-enabled clusters with Spark Connect)
<class 'pyspark.sql.session.SparkSession'> (used in standard clusters)
Why This Happens
Unity Catalog clusters often use Spark Connect, which is a client-server architecture where the client uses pyspark.sql.connect.SparkSession.
Non-Unity Catalog clusters use the traditional monolithic SparkSession (pyspark.sql.SparkSession).
When we are running code in standard clusters and taking model file from mounts than we are able to run code
but in case of unity catalog cluster, spark session is created using spark connect in which below code is not working
from pyspark.sql import SparkSession
#from pyspark.ml.pipeline import PipelineModel
from pyspark.ml.classification import RandomForestClassificationModel
from datetime import datetime
from pyspark.ml import PipelineModel
# Load the model from Unity Catalog volume
model_path = "<volumnePath>/sparkML_pipeline2022_2_0.model"
pipeline_model = PipelineModel.load(model_path)
Able to run
-- on single user cluster. This is not recommented as multiple user will be using same cluster
Please let me know if any one of you can help in fixing this issue