<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: databricks-connect error when executing sparkml in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23742#M1342</link>
    <description>&lt;P&gt;For information, upgrading python libraries does not resolve all problems.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This code works fine on databricks in a notebook :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import mlflow
model = mlflow.spark.load_model('runs:/cb6ff62587a0404cabeadd47e4c9408a/model')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Whereas it failed on intelliJ with databricks-connect&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you have any solution  ?&lt;/P&gt;</description>
    <pubDate>Fri, 10 Mar 2023 11:00:18 GMT</pubDate>
    <dc:creator>Oliver_Floyd</dc:creator>
    <dc:date>2023-03-10T11:00:18Z</dc:date>
    <item>
      <title>databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23734#M1334</link>
      <description>&lt;P&gt;I use databricks-connect, and spark jobs related spark dataframe works good. But, when I trigger spark ml code, I am getting errors.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For example, after executing in the code: &lt;A href="https://docs.databricks.com/_static/notebooks/gbt-regression.html" alt="https://docs.databricks.com/_static/notebooks/gbt-regression.html" target="_blank"&gt;https://docs.databricks.com/_static/notebooks/gbt-regression.html&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;pipelineModel = pipeline.fit(train)&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;22/11/04 09:28:15 ERROR Instrumentation: java.io.IOException: unexpected exception type
	at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750)
	at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
---------------------------
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
---------------------------
Caused by: java.lang.IllegalArgumentException: Illegal lambda deserialization
	at scala.runtime.LambdaDeserializer$.makeCallSite$1(LambdaDeserializer.scala:89)
	at scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:114)
	at scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)
---------------------------
py4j.protocol.Py4JJavaError: An error occurred while calling o806.fit.
: java.io.IOException: unexpected exception type
	at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1750)
	at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1280)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
---------------------------
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
---------------------------
Caused by: java.lang.IllegalArgumentException: Illegal lambda deserialization
	at scala.runtime.LambdaDeserializer$.makeCallSite$1(LambdaDeserializer.scala:89)
	at scala.runtime.LambdaDeserializer$.deserializeLambda(LambdaDeserializer.scala:114)
	at scala.runtime.LambdaDeserialize.deserializeLambda(LambdaDeserialize.java:38)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does anyone know how to fix it?&lt;/P&gt;</description>
      <pubDate>Fri, 04 Nov 2022 13:40:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23734#M1334</guid>
      <dc:creator>Troy</dc:creator>
      <dc:date>2022-11-04T13:40:31Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23736#M1336</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​, I am using 10.4.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2022 23:25:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23736#M1336</guid>
      <dc:creator>Troy</dc:creator>
      <dc:date>2022-11-11T23:25:30Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23737#M1337</link>
      <description>&lt;P&gt;I'm encountering the exact same problem. I'm also using databricks connect 10.4.12. Our models ran in production pipeline are doing fine because they are ran using the Databricks UI, and not databricks-connect. However, in our testing CI pipeline they are ran using databricks-connect in docker containers (using Concourse-CI). The codebase are the same. When I try to run the same code manually on my local machine connected to our cluster via databricks-connect I run into the same problem with Troy here.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In fact, I tried to run a very minimal random forest classifier and I STILL run into the same problem. Here are the code I use:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import numpy as np
import pandas as pd
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.session import SparkSession
&amp;nbsp;
&amp;nbsp;
spark = SparkSession.builder.getOrCreate()
data = spark.createDataFRame(
    pd.DataFrame({
        "feature_a": np.random.random(100),
        "feature_b": np.random.random(100),
        "feature_c": np.random.random(100),
        "label": np.random.choice([0, 1], 100)
})
vector_assembler = VectorAssembler(
    inputCols=[f"feature_{n}" for n in ["a", "b", "c"],
    outputCol="features",
)
parsed_data = (
    vector_assembler
    .transform(data)
    .drop(*[f"feature_{n}" for n in ["a", "b", "c"])
)
model = RandomForestClassifier()
model.fit(parsed_data)
# Error thrown here, very similar to Troy's.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm attaching my error output as well.&lt;/P&gt;</description>
      <pubDate>Sat, 26 Nov 2022 23:59:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23737#M1337</guid>
      <dc:creator>matt_chan</dc:creator>
      <dc:date>2022-11-26T23:59:39Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23738#M1338</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp;any pointers at all? &lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 16:05:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23738#M1338</guid>
      <dc:creator>matt_chan</dc:creator>
      <dc:date>2022-12-01T16:05:35Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23739#M1339</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Same problem, here in France. &lt;/P&gt;&lt;P&gt;@Kaniz Fatma​&amp;nbsp;Can we have some answers?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Mar 2023 08:16:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23739#M1339</guid>
      <dc:creator>Oliver_Floyd</dc:creator>
      <dc:date>2023-03-08T08:16:03Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23741#M1341</link>
      <description>&lt;P&gt;Good morning, &lt;/P&gt;&lt;P&gt;For information the error is not at all related to the limitations of databricks connect. &lt;/P&gt;&lt;P&gt;After various tests, in my case, it turns out that it is necessary to update the libraries of the venv used with databricks connect. &lt;/P&gt;&lt;P&gt;Here are the python library updates I made: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;databricks-connect from 10.4.12 to 10.4.21 &lt;/LI&gt;&lt;LI&gt;databricks-cli from 0.17.3 to 0.17.4 &lt;/LI&gt;&lt;LI&gt;mlflow from 1.26.1 to 2.2.1 &lt;/LI&gt;&lt;LI&gt;protobuf from 3.20.0 to 3.20.3&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that I work with a 10.4 lts cluster &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After these updates, the code example above works fine on intelliJ with databricks connect&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2023 09:37:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23741#M1341</guid>
      <dc:creator>Oliver_Floyd</dc:creator>
      <dc:date>2023-03-09T09:37:33Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect error when executing sparkml</title>
      <link>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23742#M1342</link>
      <description>&lt;P&gt;For information, upgrading python libraries does not resolve all problems.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This code works fine on databricks in a notebook :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import mlflow
model = mlflow.spark.load_model('runs:/cb6ff62587a0404cabeadd47e4c9408a/model')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Whereas it failed on intelliJ with databricks-connect&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you have any solution  ?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Mar 2023 11:00:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/databricks-connect-error-when-executing-sparkml/m-p/23742#M1342</guid>
      <dc:creator>Oliver_Floyd</dc:creator>
      <dc:date>2023-03-10T11:00:18Z</dc:date>
    </item>
  </channel>
</rss>

