cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parsing Japanese characters in Spark & Databricks

RiyazAli
Valued Contributor II

I'm trying to read the data which has Japanese headers, might as well have Japanese data. Currently when I say header is True, I see all jumbled characters. Can any one help how can I parse these Japanese characters correctly?

Riz
2 REPLIES 2

Avinash_Narala
Valued Contributor II

Hi @RiyazAli ,

You  need to encode the data in that language format , i.e, if the data is in japanease then u need to encode in UTF-8 

CREATE OR REPLACE TEMP VIEW japanese_data

AS SELECT * FROM

csv.`path/to/japanese_data.csv`

OPTIONS ('encoding'='UTF-8')

also you can use various libraries and tools for natural language processing (NLP) in Databricks

RiyazAli
Valued Contributor II

Thank you, @Avinash_Narala 

I definitely used the encoding options to parse the data again but this time I used an encoding called `SHIFT_JIS` to solve the problem. Appreciate the quick response.!

Riz

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now