<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic mlflow spark load_model fails with FMRegressor Model error on Unity Catalog in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149400#M4558</link>
    <description>&lt;P&gt;We trained a Spark ML FMRegressor model and registered it to Unity Catalog via MLflow. When attempting to load it back using mlflow.spark.load_model, we get an&lt;BR /&gt;&lt;BR /&gt;OSError: [Errno 5] Input/output error: '/dbfs/tmp' regardless of what dfs_tmpdir path is passed.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Tried:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Using mlflow.pyfunc.spark_udf as an alternative also fails — when the features VectorUDT column is serialized through pandas during UDF execution, it loses its type and becomes a plain StructType, causing an IllegalArgumentException at inference time.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Does anyone have a fix for this?&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 26 Feb 2026 17:37:54 GMT</pubDate>
    <dc:creator>boskicl</dc:creator>
    <dc:date>2026-02-26T17:37:54Z</dc:date>
    <item>
      <title>mlflow spark load_model fails with FMRegressor Model error on Unity Catalog</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149400#M4558</link>
      <description>&lt;P&gt;We trained a Spark ML FMRegressor model and registered it to Unity Catalog via MLflow. When attempting to load it back using mlflow.spark.load_model, we get an&lt;BR /&gt;&lt;BR /&gt;OSError: [Errno 5] Input/output error: '/dbfs/tmp' regardless of what dfs_tmpdir path is passed.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Tried:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Using mlflow.pyfunc.spark_udf as an alternative also fails — when the features VectorUDT column is serialized through pandas during UDF execution, it loses its type and becomes a plain StructType, causing an IllegalArgumentException at inference time.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Does anyone have a fix for this?&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2026 17:37:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149400#M4558</guid>
      <dc:creator>boskicl</dc:creator>
      <dc:date>2026-02-26T17:37:54Z</dc:date>
    </item>
    <item>
      <title>Re: mlflow spark load_model fails with FMRegressor Model error on Unity Catalog</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149401#M4559</link>
      <description>&lt;LI-CODE lang="python"&gt;from pyspark.ml import PipelineModel

mlflow.set_registry_uri("databricks-uc")

local_model_path = "/local_disk0/mlflow_model"
volume_path = f"/Volumes/{catalogue}/default/mlflow_tmp/sparkml"

# Works fine - downloads to driver
mlflow.artifacts.download_artifacts(
    artifact_uri=f"models:/{model_name}@production",
    dst_path=local_model_path
)

# Copy from driver local disk to UC Volume (shared across all nodes)
dbutils.fs.cp(
    f"file://{local_model_path}/sparkml",
    f"dbfs:{volume_path}",
    recurse=True
)

# Load from UC Volume — all workers can reach this
model = PipelineModel.load(volume_path)&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;Tried this workaround just now but is there a proper way to read the Regressor type model?&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2026 17:40:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149401#M4559</guid>
      <dc:creator>boskicl</dc:creator>
      <dc:date>2026-02-26T17:40:59Z</dc:date>
    </item>
    <item>
      <title>Re: mlflow spark load_model fails with FMRegressor Model error on Unity Catalog</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149404#M4560</link>
      <description>&lt;P&gt;I also attempted this:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mlflow.set_registry_uri("databricks-uc")

loaded_model = mlflow.pyfunc.spark_udf(
    spark,
    model_uri=f"models:/{model_name}@production",
    result_type="double"
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And got it to load in this command cell and tried to do predictions like:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;assemble_transform = assembler.transform(allNewRels)

preds_final_df = (
assemble_transform.withColumn(
    "prediction",
    loaded_model(struct(col("features")))
).select("id", "second_id", "prediction")
)&lt;/LI-CODE&gt;&lt;P&gt;But trying to save the above dataframe to a Delta table caused a Python worker to error out.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;pyspark.errors.exceptions.captured.IllegalArgumentException: requirement failed: Column features must be of type class org.apache.spark.ml.linalg.VectorUDT:struct&amp;lt;type:tinyint,size:int,indices:array&amp;lt;int&amp;gt;,values:array&amp;lt;double&amp;gt;&amp;gt; but was actually class org.apache.spark.sql.types.StructType:struct&amp;lt;indices:array&amp;lt;int&amp;gt;,size:bigint,type:bigint,values:array&amp;lt;double&amp;gt;&amp;gt;.&lt;/LI-CODE&gt;&lt;P&gt;I think we can't load the model due to being a Regressor and has this VectorUDT&lt;BR /&gt;&lt;BR /&gt;Just trying to post the things I tried and where it fell short.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Feb 2026 17:47:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/149404#M4560</guid>
      <dc:creator>boskicl</dc:creator>
      <dc:date>2026-02-26T17:47:57Z</dc:date>
    </item>
    <item>
      <title>Re: mlflow spark load_model fails with FMRegressor Model error on Unity Catalog</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/150085#M4568</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;This is a well-documented issue that comes down to cluster access mode and how mlflow.spark.load_model handles temporary file storage. Let me break down both problems you are hitting and provide solutions.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;PROBLEM 1: OSError: [Errno 5] Input/output error: '/dbfs/tmp'&lt;/P&gt;
&lt;P&gt;The root cause is that mlflow.spark.load_model uses the dfs_tmpdir parameter (which defaults to /tmp/mlflow) to temporarily stage model artifacts via the DBFS FUSE mount at /dbfs/. On Shared (Standard) access mode clusters, DBFS FUSE is not supported. From the Databricks documentation on access mode limitations:&lt;/P&gt;
&lt;P&gt;"DBFS root and mounts do not support FUSE" and "POSIX-style paths (/) for DBFS are not supported."&lt;/P&gt;
&lt;P&gt;This means mlflow.spark.load_model will always fail on Shared/Standard clusters because it cannot write to /dbfs/tmp.&lt;/P&gt;
&lt;P&gt;Docs: &lt;A href="https://docs.databricks.com/en/compute/access-mode-limitations.html" target="_blank"&gt;https://docs.databricks.com/en/compute/access-mode-limitations.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;SOLUTION OPTIONS FOR PROBLEM 1&lt;/P&gt;
&lt;P&gt;Option A -- Use a Dedicated (Single User) access mode cluster&lt;/P&gt;
&lt;P&gt;This is the simplest fix. Dedicated access mode clusters support DBFS FUSE mounts, so mlflow.spark.load_model works out of the box:&lt;/P&gt;
&lt;P&gt;import mlflow&lt;/P&gt;
&lt;P&gt;model = mlflow.spark.load_model("models:/your_model_name@production")&lt;BR /&gt;predictions = model.transform(test_df)&lt;/P&gt;
&lt;P&gt;Machine learning workloads on Databricks generally require Dedicated access mode.&lt;/P&gt;
&lt;P&gt;Docs: &lt;A href="https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html" target="_blank"&gt;https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Option B -- Use a Unity Catalog Volume as a staging path (your workaround, refined)&lt;/P&gt;
&lt;P&gt;Your workaround of downloading artifacts to a UC Volume is actually a solid approach. Here is a cleaner version:&lt;/P&gt;
&lt;P&gt;from pyspark.ml import PipelineModel&lt;BR /&gt;import mlflow&lt;/P&gt;
&lt;P&gt;mlflow.set_registry_uri("databricks-uc")&lt;/P&gt;
&lt;P&gt;catalogue = "your_catalog"&lt;BR /&gt;schema = "your_schema"&lt;BR /&gt;volume = "your_volume"&lt;/P&gt;
&lt;P&gt;local_model_path = "/local_disk0/mlflow_model"&lt;BR /&gt;volume_path = f"/Volumes/{catalogue}/{schema}/{volume}/sparkml_model"&lt;/P&gt;
&lt;P&gt;# Step 1: Download artifacts to the driver's local disk&lt;BR /&gt;mlflow.artifacts.download_artifacts(&lt;BR /&gt;artifact_uri="models:/your_model_name@production",&lt;BR /&gt;dst_path=local_model_path&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Step 2: Copy to UC Volume (accessible by all workers)&lt;BR /&gt;dbutils.fs.cp(&lt;BR /&gt;f"file://{local_model_path}/sparkml",&lt;BR /&gt;volume_path,&lt;BR /&gt;recurse=True&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Step 3: Load using PipelineModel.load directly&lt;BR /&gt;model = PipelineModel.load(volume_path)&lt;/P&gt;
&lt;P&gt;# Step 4: Transform -- since PipelineModel wraps your FMRegressionModel,&lt;BR /&gt;# this works directly on Spark DataFrames with VectorUDT columns&lt;BR /&gt;predictions = model.transform(test_df)&lt;/P&gt;
&lt;P&gt;This avoids the DBFS FUSE requirement entirely. UC Volumes are accessible from all cluster access modes.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Option C -- Set dfs_tmpdir to a cloud storage path&lt;/P&gt;
&lt;P&gt;If you are on a Dedicated cluster but still hitting the error, you can explicitly set dfs_tmpdir to a cloud storage path:&lt;/P&gt;
&lt;P&gt;model = mlflow.spark.load_model(&lt;BR /&gt;"models:/your_model_name@production",&lt;BR /&gt;dfs_tmpdir="dbfs:/tmp/mlflow_staging"&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;Or to a UC Volume path:&lt;/P&gt;
&lt;P&gt;model = mlflow.spark.load_model(&lt;BR /&gt;"models:/your_model_name@production",&lt;BR /&gt;dfs_tmpdir="/Volumes/your_catalog/your_schema/your_volume/mlflow_tmp"&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;PROBLEM 2: mlflow.pyfunc.spark_udf FAILS WITH VectorUDT&lt;/P&gt;
&lt;P&gt;The error you see:&lt;/P&gt;
&lt;P&gt;IllegalArgumentException: requirement failed: Column features must be of type&lt;BR /&gt;class org.apache.spark.ml.linalg.VectorUDT but was actually class&lt;BR /&gt;org.apache.spark.sql.types.StructType&lt;/P&gt;
&lt;P&gt;This happens because mlflow.pyfunc.spark_udf routes data through pandas during UDF execution. Spark ML's VectorUDT is a special type that pandas cannot represent natively -- it gets decomposed into a plain struct with type, size, and values fields. When the data comes back into Spark, the model expects a VectorUDT column but receives a StructType instead.&lt;/P&gt;
&lt;P&gt;This is a fundamental limitation of the pyfunc/UDF approach for Spark ML models that use VectorUDT features. The workaround is to avoid the pyfunc path entirely and use the native Spark ML API (i.e., PipelineModel.transform()) as shown in Option A or B above.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;RECOMMENDED APPROACH&lt;/P&gt;
&lt;P&gt;The cleanest solution is to use a Dedicated access mode cluster and call mlflow.spark.load_model directly. If you must use a Shared cluster, use the UC Volume workaround (Option B) to download, copy, and load via PipelineModel.load.&lt;/P&gt;
&lt;P&gt;Since mlflow.spark.load_model returns a PipelineModel anyway (MLflow wraps individual models like FMRegressionModel in a PipelineModel during logging), using PipelineModel.load directly in Option B gives you the same result. You can then call .transform() on it with your VectorAssembler-transformed DataFrame and it will work correctly since the data stays in Spark's native format without pandas serialization.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;DOCUMENTATION REFERENCES&lt;/P&gt;
&lt;P&gt;- MLflow Spark API docs - load_model: &lt;A href="https://mlflow.org/docs/latest/python_api/mlflow.spark.html" target="_blank"&gt;https://mlflow.org/docs/latest/python_api/mlflow.spark.html&lt;/A&gt;&lt;BR /&gt;- Databricks access mode limitations: &lt;A href="https://docs.databricks.com/en/compute/access-mode-limitations.html" target="_blank"&gt;https://docs.databricks.com/en/compute/access-mode-limitations.html&lt;/A&gt;&lt;BR /&gt;- Manage model lifecycle in Unity Catalog: &lt;A href="https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html" target="_blank"&gt;https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html&lt;/A&gt;&lt;BR /&gt;- MLflow Spark ML flavor guide: &lt;A href="https://mlflow.org/docs/latest/ml/traditional-ml/sparkml/" target="_blank"&gt;https://mlflow.org/docs/latest/ml/traditional-ml/sparkml/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps! Let me know if you have questions about any of the approaches.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sat, 07 Mar 2026 20:13:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/150085#M4568</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-07T20:13:16Z</dc:date>
    </item>
    <item>
      <title>Re: mlflow spark load_model fails with FMRegressor Model error on Unity Catalog</title>
      <link>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/150376#M4578</link>
      <description>&lt;P&gt;Thank you so much for this response! So I did find a fix (&lt;STRONG&gt;with what I posted and your Option B!&lt;/STRONG&gt;)&lt;/P&gt;&lt;P&gt;Option A: Didn't work. Surprisingly I used a ML Single User cluster for everything in my job and this wasn't a fix in my case - it still triggered this error.&lt;/P&gt;&lt;P&gt;Option B: Is exactly the route I ended up taking (I had a similar response for people to see/comment on) and I am glad you mention this because is defiantly worked!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for confirming and for others Option B is the way!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 11:16:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/mlflow-spark-load-model-fails-with-fmregressor-model-error-on/m-p/150376#M4578</guid>
      <dc:creator>boskicl</dc:creator>
      <dc:date>2026-03-09T11:16:18Z</dc:date>
    </item>
  </channel>
</rss>

