cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Exception "java.nio.charset.MalformedInputException: Input length = 1" when creating data profile on Docker Container Service (10.4 LTS)

adrianna2942842
New Contributor III

I am encountering an issue while attempting to create a data profile on clusters using Docker Container Service (version 10.4 LTS). I keep receiving the following exception:

java.nio.charset.MalformedInputException: Input length = 1

What's puzzling is that I have tested the data profile creation process on clusters without Docker, using the same library dependencies, and it works flawlessly. However, when utilizing Docker Container Service, this exception consistently occurs regardless of the input data.

I have made several attempts with different data sets, but the problem persists. I suspect that Docker Container Service may be interfering with the character encoding or input handling in some way, leading to this exception.

Has anyone else encountered a similar issue with Docker Container Service and the

java.nio.charset.MalformedInputException? I would greatly appreciate any insights, experiences, or possible solutions to help me resolve this problem.

1 ACCEPTED SOLUTION

Accepted Solutions

User16752242622
Valued Contributor

The MalformedInputException is an exception in the java.nio.charset package in Java that indicates that an input sequence is malformed or cannot be decoded correctly using a specific character set.

```java.nio.charset.MalformedInputException``` is caused by the default locale settings difference in the DCS cluster. After setting the below environment variables in the DCS cluster environment variables, you should be able to run your code fine.

Kindly add the below settings in the environment variable:

LANG=C.UTF-8

LC_ALL=C.UTF-8

By setting LANG=C.UTF-8 and LC_ALL=C.UTF-8, you are configuring the locale to use the UTF-8 character encoding, which can help address issues related to character encoding and malformed input when working with Java processes.

View solution in original post

3 REPLIES 3

User16752242622
Valued Contributor

The MalformedInputException is an exception in the java.nio.charset package in Java that indicates that an input sequence is malformed or cannot be decoded correctly using a specific character set.

```java.nio.charset.MalformedInputException``` is caused by the default locale settings difference in the DCS cluster. After setting the below environment variables in the DCS cluster environment variables, you should be able to run your code fine.

Kindly add the below settings in the environment variable:

LANG=C.UTF-8

LC_ALL=C.UTF-8

By setting LANG=C.UTF-8 and LC_ALL=C.UTF-8, you are configuring the locale to use the UTF-8 character encoding, which can help address issues related to character encoding and malformed input when working with Java processes.

Thank you for a reply! I have checked that the above solution fixed the exception.

Vartika
Databricks Employee
Databricks Employee

Hi @Adrianna Klank​,

We haven't heard from you since the last response from @Akash Bhat​​, and I was checking back to see if the suggestion helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group