cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parsing Japanese characters in Spark & Databricks

RiyazAli
Valued Contributor II

I'm trying to read the data which has Japanese headers, might as well have Japanese data. Currently when I say header is True, I see all jumbled characters. Can any one help how can I parse these Japanese characters correctly?

Riz
2 REPLIES 2

Avinash_Narala
Valued Contributor II

Hi @RiyazAli ,

You  need to encode the data in that language format , i.e, if the data is in japanease then u need to encode in UTF-8 

CREATE OR REPLACE TEMP VIEW japanese_data

AS SELECT * FROM

csv.`path/to/japanese_data.csv`

OPTIONS ('encoding'='UTF-8')

also you can use various libraries and tools for natural language processing (NLP) in Databricks

RiyazAli
Valued Contributor II

Thank you, @Avinash_Narala 

I definitely used the encoding options to parse the data again but this time I used an encoding called `SHIFT_JIS` to solve the problem. Appreciate the quick response.!

Riz

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group