06-05-2023 01:15 AM
I am encountering an issue while attempting to create a data profile on clusters using Docker Container Service (version 10.4 LTS). I keep receiving the following exception:
java.nio.charset.MalformedInputException: Input length = 1
What's puzzling is that I have tested the data profile creation process on clusters without Docker, using the same library dependencies, and it works flawlessly. However, when utilizing Docker Container Service, this exception consistently occurs regardless of the input data.
I have made several attempts with different data sets, but the problem persists. I suspect that Docker Container Service may be interfering with the character encoding or input handling in some way, leading to this exception.
Has anyone else encountered a similar issue with Docker Container Service and the
java.nio.charset.MalformedInputException? I would greatly appreciate any insights, experiences, or possible solutions to help me resolve this problem.
06-08-2023 07:30 AM
The MalformedInputException is an exception in the java.nio.charset package in Java that indicates that an input sequence is malformed or cannot be decoded correctly using a specific character set.
```java.nio.charset.MalformedInputException``` is caused by the default locale settings difference in the DCS cluster. After setting the below environment variables in the DCS cluster environment variables, you should be able to run your code fine.
Kindly add the below settings in the environment variable:
LANG=C.UTF-8
LC_ALL=C.UTF-8
By setting LANG=C.UTF-8 and LC_ALL=C.UTF-8, you are configuring the locale to use the UTF-8 character encoding, which can help address issues related to character encoding and malformed input when working with Java processes.
06-08-2023 07:30 AM
The MalformedInputException is an exception in the java.nio.charset package in Java that indicates that an input sequence is malformed or cannot be decoded correctly using a specific character set.
```java.nio.charset.MalformedInputException``` is caused by the default locale settings difference in the DCS cluster. After setting the below environment variables in the DCS cluster environment variables, you should be able to run your code fine.
Kindly add the below settings in the environment variable:
LANG=C.UTF-8
LC_ALL=C.UTF-8
By setting LANG=C.UTF-8 and LC_ALL=C.UTF-8, you are configuring the locale to use the UTF-8 character encoding, which can help address issues related to character encoding and malformed input when working with Java processes.
06-12-2023 04:52 AM
Thank you for a reply! I have checked that the above solution fixed the exception.
06-09-2023 04:39 AM
Hi @Adrianna Klank,
We haven't heard from you since the last response from @Akash Bhat, and I was checking back to see if the suggestion helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group