topic Re: Spark Read CSV doesn't preserve the double quotes while reading! in Data Engineering

Spark Read CSV doesn't preserve the double quotes while reading!

DineshKumar — Mon, 24 Aug 2020 16:52:19 GMT

Hi , I am trying to read a csv file with one column has double quotes like below.

James,Butt,"Benton, John B Jr",6649 N Blue Gum St
Josephine,Darakjy,"Chanay, Jeffrey A Esq",4 B Blue Ridge Blvd
Art,Venere,"Chemel, James L Cpa",8 W Cerritos Ave #54
Lenna,Paprocki,Feltz Printing Service,639 Main St,Anchorage
Donette,Foller,Printing Dimensions,34 Center St,Hamilton
Simona,Morasca,"Chapman, Ross E Esq",3 Mcauley Dr

I am using the below code to keep the double quotes as its from the csv file.(few rows having double quotes and few dont)

val df_usdata    = spark.read.format("com.databricks.spark.csv")// 
.option("header","true")//
.option("quote","\"")// 
.load("file:///E://data//csvdata.csv")
df_usdata.show(false)

But it didn't preserve the double quotes inside the dataframe but it should be.

The .option("quote","\"") is not working. Am using Spark 2.3.1 version.

The output should be like below.

+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name             |address              | 
+----------+---------+-------------------------+---------------------+ 
|James     |Butt     |"Benton, John B Jr"      |6649 N Blue Gum St   |
|Josephine |Darakjy  |"Chanay, Jeffrey A Esq"  |4 B Blue Ridge Blvd  |
|Art       |Venere   |"Chemel, James L Cpa"    |8 W Cerritos Ave #54 |
|Lenna     |Paprocki |Feltz Printing Service   |639 Main St          | 
|Donette   |Foller   |Printing Dimensions      |34 Center St         | 
|Simona    |Morasca  |"Chapman, Ross E Esq"    |3 Mcauley Dr         |
+----------+---------+-------------------------+---------------------+

Regards, Dinesh Kumar

Re: Spark Read CSV doesn't preserve the double quotes while reading!

DineshKumar — Tue, 25 Aug 2020 16:02:06 GMT

When I tried with

.option("quote","")

and .option("quote","\u0000") the company_name column values got splitted into next column like below.

+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name             |address              |
+----------+---------+-------------------------+---------------------+
|James     |Butt     |"Benton                  | John B Jr"          |
|Josephine |Darakjy  |"Chanay                  | Jeffrey A Esq"      |
|Art       |Venere   |"Chemel                  | James L Cpa"        |
|Lenna     |Paprocki |Feltz Printing Service   |639 Main St          |
|Donette   |Foller   |Printing Dimensions      |34 Center St         |
|Simona    |Morasca  |"Chapman                 | Ross E Esq"         |
+----------+---------+-------------------------+---------------------+

Re: Spark Read CSV doesn't preserve the double quotes while reading!

Forum_Admin — Fri, 06 Aug 2021 09:16:48 GMT

Try using both of these options :

.option("quote", "\"")

.option("escape", "\"")

Re: Spark Read CSV doesn't preserve the double quotes while reading!

ManishRana — Fri, 21 Jan 2022 10:29:00 GMT

Thanks, it resolves my issue with the csv generation

Re: Spark Read CSV doesn't preserve the double quotes while reading!

Munni — Thu, 15 Sep 2022 05:15:06 GMT

Hai Currently,I am also facing same issue,please let me know how this issue resolved.

Thanks,

Munni

Re: Spark Read CSV doesn't preserve the double quotes while reading!

LearningAj — Thu, 10 Aug 2023 19:08:21 GMT

Hi Team,

I am also facing same issue and i have applied all the option mentioned from above posts:

I will just post my dataset here:

Attached is the my input data with 3 different column out of which comment column contains text value with double quotes and commas and to read this dataset i ave used all escape options but still comment column's data is moving to third column.

Below is the dataset from csv after performing read:

Could you please help on this issue ASAP.