โ11-13-2017 01:51 AM
I'm running Spark 2.2.0 at the moment. Currently I'm facing an issue when importing data of Mexican origin, where the characters can have special characters and with multiline for certain columns.
Ideally, this is the command I'd like to run:
T_new_exp = spark.read\
.option("charset", "ISO-8859-1")\
.option("parserLib", "univocity")\
.option("multiLine", "true")\
.schema(schema)\
.csv(file)
However, using the above gives me properly lined rows but without the correct charset. Instead of displaying e acute for example, I'm getting the replacement character (U+FFFD). It's only when I remove the multiline option do I get the right charset (but without the multiline issue being fix).
The only solution that I have to workaround this problem for now is to preprocess the data separately before it is loaded to databricks; that is - fix the multiline first in unix and let Databricks handle the unicode issues later.
Is there a simpler way than this?
โ08-29-2018 05:43 AM
Did you tired encoding option ? .option("encoding", "UTF-8") .csv(inputPath)
,did you tried utf8 option ?
.option("encoding", "UTF-8") .csv(inputPath)
โ08-29-2018 05:44 AM
@Hafidz Zulkifliโ check my answer
โ08-29-2018 07:58 PM
@kali.tummala@gmail.comโ Tried it just now. It didn't work. There are two parts to the problem - one is handling multiline. The other is to handle differing charset.
โ09-07-2018 06:58 AM
Are you sure it's the parsing that's the issue, and not simply the display?
โ10-01-2019 04:32 AM
Hi ,
Did anyone find any solution for this.
โ04-22-2020 10:17 AM
Please make sure you are using or enforcing python 3. python 2 is default and it will have issues with encoding
โ05-27-2020 06:22 AM
.option("charset", "iso-8859-1")
.option("multiLine", True) .option("lineSep ",'\n\r')โ09-25-2021 04:18 AM
You could also potentially use the .withColumns() function on the data frame, and use the pyspark.sql.functions.encode function to convert the characterset to the one you need.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group