lingareddy_Alva
Esteemed Contributor

Hi @kamal_sharma2 

Hi @kamal_sharma2 

This is a well-known limitation when working with Unity Catalog clusters and Spark ML models. The issue occurs because Spark Connect (used in Unity Catalog clusters) doesn't support direct JVM access, which PipelineModel.load() requires.

Here is solution to resolve this:

Solution 1: Use MLflow for Model Management:

import mlflow
import mlflow.spark
from mlflow.tracking import MlflowClient

# If your model isn't already in MLflow, register it first:
# (Run this once on a standard cluster)
"""
with mlflow.start_run():
mlflow.spark.log_model(pipeline_model, "spark_pipeline_model")

# Register the model
client = MlflowClient()
model_version = mlflow.register_model(
f"runs:/{mlflow.active_run().info.run_id}/spark_pipeline_model",
"spark_pipeline_classifier"
)
"""

# Load the model in Unity Catalog cluster:
model_uri = "models:/spark_pipeline_classifier/latest" # or specific version
loaded_model = mlflow.spark.load_model(model_uri)

# Use the model for predictions
predictions = loaded_model.transform(test_df)

 

LR