cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error loading model from mlflow: java.io.StreamCorruptedException: invalid type code: 00

NSRBX
Contributor

Hello,

I'm using, in my IDE, Databricks Connect version 9.1LTS ML to connect to a databricks cluster with spark version 3.1 and download a spark model that's been trained and saved using mlflow.

So it seems like it's able to find a copy the model, but then it's wrong. The same works in a databricks notebook are goods, the problem only occurs using databricks connect in my IDE.

We are getting the same error in different repositories with different models. It started to appear recently.

I have the same problem in other environment with cluster 10.4LTS ML and databricks-connect 10.4.6.

Do you have an idea ?

code :

mlflow.set_tracking_uri("databricks")

model_path = 'dbfs:/databricks/mlflow-tracking/197830957424395/7c5e692873874dadae4f67f44c1aa310/artifacts/rfModel'

model_res = mlflow.spark.load_model(model_path)

See the StackTraceError :

2022/10/06 15:17:11 INFO mlflow.spark: File 'dbfs:/databricks/mlflow-tracking/197830957424395/7c5e692873874dadae4f67f44c1aa310/artifacts/rfModel/sparkml' not found on DFS. Will attempt to upload the file.

22/10/06 15:17:39 WARN DBFS: DBFS create on /tmp/mlflow/f020cb9a-47b2-49ee-8b12-cf2754db61a9/metadata/part-00000 took 2299 ms

22/10/06 15:17:42 WARN DBFS: DBFS create on /tmp/mlflow/f020cb9a-47b2-49ee-8b12-cf2754db61a9/metadata/_SUCCESS took 1687 ms

22/10/06 15:17:46 WARN DBFS: DBFS mkdirs on /tmp/mlflow/f020cb9a-47b2-49ee-8b12-cf2754db61a9/stages/0_RandomForestClassifier_77e9017cbf4d took 2302 ms

2022/10/06 15:19:13 INFO mlflow.spark: Copied SparkML model to /tmp/mlflow/f020cb9a-47b2-49ee-8b12-cf2754db61a9

View job details at ........https....

View job details at ........ https .....

22/10/06 15:19:16 ERROR Instrumentation: java.io.StreamCorruptedException: invalid type code: 00

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1698)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)

at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488)

at sun.reflect.GeneratedMethodAccessor419.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2296)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)

at org.apache.spark.sql.util.ProtoSerializer.$anonfun$deserializeObject$1(ProtoSerializer.scala:6631)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)

at org.apache.spark.sql.util.ProtoSerializer.deserializeObject(ProtoSerializer.scala:6616)

at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:728)

at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:477)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)

at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:372)

at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:323)

at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:309)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:359)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)

at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:336)

at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:167)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)

at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)

at org.eclipse.jetty.server.Server.handle(Server.java:516)

at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)

at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)

at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)

at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)

at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)

at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)

at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)

at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)

at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)

at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)

at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)

at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)

at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)

at java.lang.Thread.run(Thread.java:748)

...

py4j.protocol.Py4JJavaError: An error occurred while calling o588.load.

: java.io.StreamCorruptedException: invalid type code: 00

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1698)

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

NSRBX
Contributor

Hi @Kaniz Fatmaโ€‹ and @Shanmugavel Chandrakasuโ€‹,

It works after putting hadoop.dll into C:\Windows\System32 folder.

I have hadoop version 3.3.1.

I already had winutils.exe in the Hadoop bin folder.

Regards

Nath

View solution in original post

6 REPLIES 6

shan_chandra
Honored Contributor III
Honored Contributor III

@SERET Nathalieโ€‹ - The client needs to be updated to latest version to fix this issue: https://pypi.org/project/databricks-connect/#history

Kaniz
Community Manager
Community Manager

Hi @SERET Nathalieโ€‹โ€‹, We havenโ€™t heard from you since the last response from @Shanmugavel Chandrakasuโ€‹ , and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community, as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Nath
New Contributor II

Hello,

I changed my databricks-connect version 10.4.12, mlflow version is 1.26 but it doesn't work.

I have winutils.exe in my venv under Lib\site-packages\pyspark\bin.

and my environment variable HADOOP_HOME is ok.

Version python 3.8.10.

Thanks for your help.

See the stacktraceerror :

2022/10/14 17:18:09 INFO mlflow.spark: File 'dbfs:/databricks/mlflow-tracking/67260056032267/6580b479a0ba43beaa3dd7971561fbb7/artifacts/model_rf/sparkml' not found on DFS. Will attempt to upload the file.

Traceback (most recent call last):

 File "C:\Users\NSR\py-packages\test\test_mlflow.py", line 21, in <module>

   model = exp.get_model(nom="model_rf")

 File "C:\Users\NSR\py-packages\ircem\mlflow.py", line 178, in get_model

   return mlflow.spark.load_model(model_path)

 File "D:\venv_python\Python38\lib\site-packages\mlflow\spark.py", line 711, in load_model

   return _load_model(model_uri=model_uri, dfs_tmpdir_base=dfs_tmpdir)

 File "D:\venv_python\Python38\lib\site-packages\mlflow\spark.py", line 659, in _load_model

   model_uri = _HadoopFileSystem.maybe_copy_from_uri(model_uri, dfs_tmpdir)

 File "D:\venv_python\Python38\lib\site-packages\mlflow\spark.py", line 382, in maybe_copy_from_uri

   return cls.maybe_copy_from_local_file(_download_artifact_from_uri(src_uri), dst_path)

 File "D:\venv_python\Python38\lib\site-packages\mlflow\spark.py", line 349, in maybe_copy_from_local_file

   cls.copy_from_local_file(src, dst, remove_src=False)

 File "D:\venv_python\Python38\lib\site-packages\mlflow\spark.py", line 331, in copy_from_local_file

   cls._fs().copyFromLocalFile(remove_src, cls._local_path(src), cls._remote_path(dst))

 File "D:\venv_python\Python38\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__

   return_value = get_return_value(

 File "D:\venv_python\Python38\lib\site-packages\pyspark\sql\utils.py", line 117, in deco

   return f(*a, **kw)

 File "D:\venv_python\Python38\lib\site-packages\py4j\protocol.py", line 326, in get_return_value

   raise Py4JJavaError(

py4j.protocol.Py4JJavaError: An error occurred while calling o334.copyFromLocalFile.

: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

   at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)

   at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)

   at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1215)

   at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1420)

   at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)

   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)

   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)

   at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)

   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:406)

   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:390)

   at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2482)

   at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2448)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

   at java.lang.reflect.Method.invoke(Unknown Source)

   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)

   at py4j.Gateway.invoke(Gateway.java:295)

   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

   at py4j.commands.CallCommand.execute(CallCommand.java:79)

   at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)

   at py4j.ClientServerConnection.run(ClientServerConnection.java:115)

   at java.lang.Thread.run(Unknown Source)

NSRBX
Contributor

Hi @Kaniz Fatmaโ€‹ and @Shanmugavel Chandrakasuโ€‹,

It works after putting hadoop.dll into C:\Windows\System32 folder.

I have hadoop version 3.3.1.

I already had winutils.exe in the Hadoop bin folder.

Regards

Nath

Kaniz
Community Manager
Community Manager

Hi @SERET Nathalieโ€‹ , Thank you for your response. Keep posting your answers to the community. Thank you for being an integral part of our community.

Benglish11
New Contributor II

So I am having the same issue with Databricks Connect 10.4.22 and I came across this old post. I am using linux though, what would be the equivalent fix here (hadoop.dll is a windows library)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.