cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Malformed Input Exception while saving or retreiving Table

Chandraw
New Contributor III

Hi everyone,

I am using DBR version 13 and Managed tables in a custom catalog location of table is AWS S3.

running notebook on single user cluster

I am facing MalformedInputException while saving data to Tables or reading it.

When I am running my notebook first time and tables are empty it works fine. Data is saved to tables. But when I try again immediately to run notebook, I am not able to save data and getting Exception in Subject.

However, if run notebook again next day or if delete everything from table it works fine.

I am using df.write.mode('overwrite').saveAsTable('tablename') and for reading delta table DeltaTable.forName().

Error:

Py4JJavaError: An error occurred while calling z:io.delta.tables.DeltaTable.forName. : /or os4.saveAsTable() java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.read1(BufferedReader.java:210) at java.io.BufferedReader.read(BufferedReader.java:286) at java.io.Reader.read(Reader.java:140) at scala.io.BufferedSource.mkString(BufferedSource.scala:98) at com.databricks.common.client.RawDBHttpClient.getResponseBody(DBHttpClient.scala:1229) at com.databricks.common.client.RawDBHttpClient.$anonfun$httpRequestInternal$1(DBHttpClient.scala:1191) at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:571)

Thanks for any suggestion in Advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Chandraw
New Contributor III

@Retired_mod  The issue is resolved as soon as I deployed it to mutlinode dev cluster.

Issue is only occurring in single user clusters. Looks like limitation of running all updates in one node as distributed system.

View solution in original post

2 REPLIES 2

Chandraw
New Contributor III

Thanks @Retired_mod for your response.

Encoding issues: I am reading data from a table in same catalog and after bunch transformation saving data to another table in same catalog. I believe Encoding should not be issue.

Check Column Data Types: Before Saving Data to table I am ensuring the schema matches to table schema by casting columns to corresponding Data Type. I am not explicitly creating schema using StructType. But it matches the table schema at end of transformation. As a result, I am able to save data first time to Table.

Moreover, when I am testing it the source data is not changed. 

Retry Delay: I tried putting some delay in between each interaction to delta table. It did not help.

Delta Table Consistency: Can I manage or check for consistency programmatically as it appears delta table transactions are managed internally.

Thank you again very much for your Suggestion.

Chandraw
New Contributor III

@Retired_mod  The issue is resolved as soon as I deployed it to mutlinode dev cluster.

Issue is only occurring in single user clusters. Looks like limitation of running all updates in one node as distributed system.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group