<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Not able to run Pipeline Model load functions unity catalog cluster in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120867#M4101</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166447"&gt;@kamal_sharma2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166447"&gt;@kamal_sharma2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;This is a well-known limitation when working with Unity Catalog clusters and Spark ML models. The issue occurs because Spark Connect (used in Unity Catalog clusters) doesn't support direct JVM access, which PipelineModel.load() requires.&lt;/P&gt;&lt;P class=""&gt;Here is solution to resolve this:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Solution 1: Use MLflow for Model Management:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;import mlflow&lt;BR /&gt;import mlflow.spark&lt;BR /&gt;from mlflow.tracking import MlflowClient&lt;/P&gt;&lt;P&gt;# If your model isn't already in MLflow, register it first:&lt;BR /&gt;# (Run this once on a standard cluster)&lt;BR /&gt;"""&lt;BR /&gt;with mlflow.start_run():&lt;BR /&gt;mlflow.spark.log_model(pipeline_model, "spark_pipeline_model")&lt;BR /&gt;&lt;BR /&gt;# Register the model&lt;BR /&gt;client = MlflowClient()&lt;BR /&gt;model_version = mlflow.register_model(&lt;BR /&gt;f"runs:/{mlflow.active_run().info.run_id}/spark_pipeline_model",&lt;BR /&gt;"spark_pipeline_classifier"&lt;BR /&gt;)&lt;BR /&gt;"""&lt;/P&gt;&lt;P&gt;# Load the model in Unity Catalog cluster:&lt;BR /&gt;model_uri = "models:/spark_pipeline_classifier/latest" # or specific version&lt;BR /&gt;loaded_model = mlflow.spark.load_model(model_uri)&lt;/P&gt;&lt;P&gt;# Use the model for predictions&lt;BR /&gt;predictions = loaded_model.transform(test_df)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Jun 2025 01:27:53 GMT</pubDate>
    <dc:creator>lingareddy_Alva</dc:creator>
    <dc:date>2025-06-04T01:27:53Z</dc:date>
    <item>
      <title>Not able to run Pipeline Model load functions unity catalog cluster</title>
      <link>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120560#M4093</link>
      <description>&lt;P&gt;ISSUE -- Not able to run PipelineModel load functions unity catalog cluster&lt;/P&gt;&lt;P&gt;ERROR --[JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `sparkContext` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit &lt;A href="https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession" target="_blank"&gt;https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession&lt;/A&gt; for creating regular Spark Session in detail.&lt;/P&gt;&lt;P&gt;ANALYSIS --&lt;BR /&gt;In Databricks, the difference between spark session type:&lt;/P&gt;&lt;P&gt;&amp;lt;class 'pyspark.sql.connect.session.SparkSession'&amp;gt; (used in Unity Catalog-enabled clusters with Spark Connect)&lt;BR /&gt;&amp;lt;class 'pyspark.sql.session.SparkSession'&amp;gt; (used in standard clusters)&lt;/P&gt;&lt;P&gt;Why This Happens&lt;BR /&gt;Unity Catalog clusters often use Spark Connect, which is a client-server architecture where the client uses pyspark.sql.connect.SparkSession.&lt;BR /&gt;Non-Unity Catalog clusters use the traditional monolithic SparkSession (pyspark.sql.SparkSession).&lt;/P&gt;&lt;P&gt;When we are running code in standard clusters and taking model file from mounts than we are able to run code&lt;/P&gt;&lt;P&gt;but in case of unity catalog cluster, spark session is created using spark connect in which below code is not working&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;BR /&gt;#from pyspark.ml.pipeline import PipelineModel&lt;BR /&gt;from pyspark.ml.classification import RandomForestClassificationModel&lt;BR /&gt;from datetime import datetime&lt;BR /&gt;from pyspark.ml import PipelineModel&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;# Load the model from Unity Catalog volume&lt;BR /&gt;model_path = "&amp;lt;volumnePath&amp;gt;/sparkML_pipeline2022_2_0.model"&lt;BR /&gt;pipeline_model = PipelineModel.load(model_path)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Able to run&lt;BR /&gt;-- on single user cluster. This is not recommented as multiple user will be using same cluster&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know if any one of you can help in fixing this issue&lt;/P&gt;</description>
      <pubDate>Thu, 29 May 2025 14:54:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120560#M4093</guid>
      <dc:creator>kamal_sharma2</dc:creator>
      <dc:date>2025-05-29T14:54:26Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to run Pipeline Model load functions unity catalog cluster</title>
      <link>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120867#M4101</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166447"&gt;@kamal_sharma2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166447"&gt;@kamal_sharma2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;This is a well-known limitation when working with Unity Catalog clusters and Spark ML models. The issue occurs because Spark Connect (used in Unity Catalog clusters) doesn't support direct JVM access, which PipelineModel.load() requires.&lt;/P&gt;&lt;P class=""&gt;Here is solution to resolve this:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Solution 1: Use MLflow for Model Management:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;import mlflow&lt;BR /&gt;import mlflow.spark&lt;BR /&gt;from mlflow.tracking import MlflowClient&lt;/P&gt;&lt;P&gt;# If your model isn't already in MLflow, register it first:&lt;BR /&gt;# (Run this once on a standard cluster)&lt;BR /&gt;"""&lt;BR /&gt;with mlflow.start_run():&lt;BR /&gt;mlflow.spark.log_model(pipeline_model, "spark_pipeline_model")&lt;BR /&gt;&lt;BR /&gt;# Register the model&lt;BR /&gt;client = MlflowClient()&lt;BR /&gt;model_version = mlflow.register_model(&lt;BR /&gt;f"runs:/{mlflow.active_run().info.run_id}/spark_pipeline_model",&lt;BR /&gt;"spark_pipeline_classifier"&lt;BR /&gt;)&lt;BR /&gt;"""&lt;/P&gt;&lt;P&gt;# Load the model in Unity Catalog cluster:&lt;BR /&gt;model_uri = "models:/spark_pipeline_classifier/latest" # or specific version&lt;BR /&gt;loaded_model = mlflow.spark.load_model(model_uri)&lt;/P&gt;&lt;P&gt;# Use the model for predictions&lt;BR /&gt;predictions = loaded_model.transform(test_df)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 01:27:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120867#M4101</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-04T01:27:53Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to run Pipeline Model load functions unity catalog cluster</title>
      <link>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120899#M4103</link>
      <description>&lt;P&gt;Thanks for your reply LRALVA, When i tried to run&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;mlflow.spark.log_model(pipeline_model, "spark_pipeline_model")&lt;/STRONG&gt; on my already saved model which was saved using random forest a long back. log_model gives me error that model is not a spark flavor. So i tried with &lt;STRONG&gt;mlflow.sklearn.log_model(pipeline_model, "spark_pipeline_model") which worked and I am able to register model under models&amp;nbsp;&lt;/STRONG&gt;but when I load it back and run transform function on it it is giving me error&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;AttributeError: &lt;/SPAN&gt;'str' object has no attribute 'transform'&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Code I am running to load -&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;import mlflow&lt;BR /&gt;import mlflow.spark&lt;BR /&gt;from mlflow.tracking import MlflowClient&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;# Load the model in Unity Catalog cluster:&lt;BR /&gt;model_uri = "models:/sparkML_rf2022_2_0/latest" # or specific version&lt;BR /&gt;loaded_model = mlflow.sklearn.load_model(model_uri)&lt;BR /&gt;df = spark.read.parquet('&amp;lt;data/path&amp;gt;')&lt;BR /&gt;# Use the model for predictions&lt;BR /&gt;predictions = loaded_model.transform(df)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 10:21:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120899#M4103</guid>
      <dc:creator>kamal_sharma2</dc:creator>
      <dc:date>2025-06-04T10:21:25Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to run Pipeline Model load functions unity catalog cluster</title>
      <link>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120950#M4106</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166447"&gt;@kamal_sharma2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issue you're encountering is due to a mismatch between model flavors and loading methods.&lt;BR /&gt;When you used mlflow.sklearn.log_model() to log a Spark ML PipelineModel, you incorrectly logged it as a scikit-learn model, but it's actually a Spark ML model. This causes type confusion when loading.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Solution: Re-log the Model with Correct Flavor&lt;/STRONG&gt;&lt;BR /&gt;First, determine what type of model you actually have:&lt;/P&gt;&lt;P&gt;from pyspark.ml import PipelineModel&lt;BR /&gt;import mlflow&lt;BR /&gt;import mlflow.spark&lt;/P&gt;&lt;P&gt;# Load your original model&lt;BR /&gt;model_path = "&amp;lt;volumePath&amp;gt;/sparkML_pipeline2022_2_0.model"&lt;BR /&gt;pipeline_model = PipelineModel.load(model_path)&lt;/P&gt;&lt;P&gt;# Check the model type&lt;BR /&gt;print(f"Model type: {type(pipeline_model)}")&lt;BR /&gt;print(f"Model stages: {[type(stage).__name__ for stage in pipeline_model.stages]}")&lt;/P&gt;&lt;P&gt;# Log it correctly as a Spark model&lt;BR /&gt;with mlflow.start_run():&lt;BR /&gt;try:&lt;BR /&gt;# This should work for Spark ML models&lt;BR /&gt;mlflow.spark.log_model(pipeline_model, "spark_pipeline_model")&lt;BR /&gt;print("Successfully logged as Spark model")&lt;BR /&gt;except Exception as e:&lt;BR /&gt;print(f"Error logging as Spark model: {e}")&lt;BR /&gt;# If it fails, the model might have compatibility issues&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If the above fails, try this alternative approach:&lt;/P&gt;&lt;P&gt;# Alternative: Log with explicit Spark ML flavor&lt;BR /&gt;import mlflow.pyfunc&lt;/P&gt;&lt;P&gt;class SparkModelWrapper(mlflow.pyfunc.PythonModel):&lt;BR /&gt;def __init__(self, spark_model):&lt;BR /&gt;self.spark_model = spark_model&lt;BR /&gt;&lt;BR /&gt;def predict(self, context, model_input):&lt;BR /&gt;# Convert pandas DataFrame to Spark DataFrame if needed&lt;BR /&gt;if hasattr(model_input, 'toPandas'):&lt;BR /&gt;# Already a Spark DataFrame&lt;BR /&gt;return self.spark_model.transform(model_input)&lt;BR /&gt;else:&lt;BR /&gt;# Convert pandas to Spark DataFrame&lt;BR /&gt;spark_df = context.spark_session.createDataFrame(model_input)&lt;BR /&gt;result = self.spark_model.transform(spark_df)&lt;BR /&gt;return result.toPandas()&lt;/P&gt;&lt;P&gt;# Log the wrapped model&lt;BR /&gt;with mlflow.start_run():&lt;BR /&gt;mlflow.pyfunc.log_model(&lt;BR /&gt;"spark_pipeline_model",&lt;BR /&gt;python_model=SparkModelWrapper(pipeline_model),&lt;BR /&gt;artifacts={"model_path": model_path}&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 16:21:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/not-able-to-run-pipeline-model-load-functions-unity-catalog/m-p/120950#M4106</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-04T16:21:55Z</dc:date>
    </item>
  </channel>
</rss>

