Spark Read CSV doesn't preserve the double quotes while reading!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2020 09:52 AM
Hi , I am trying to read a csv file with one column has double quotes like below.
James,Butt,"Benton, John B Jr",6649 N Blue Gum St
Josephine,Darakjy,"Chanay, Jeffrey A Esq",4 B Blue Ridge Blvd
Art,Venere,"Chemel, James L Cpa",8 W Cerritos Ave #54
Lenna,Paprocki,Feltz Printing Service,639 Main St,Anchorage
Donette,Foller,Printing Dimensions,34 Center St,Hamilton
Simona,Morasca,"Chapman, Ross E Esq",3 Mcauley Dr
I am using the below code to keep the double quotes as its from the csv file.(few rows having double quotes and few dont)
val df_usdata = spark.read.format("com.databricks.spark.csv")//
.option("header","true")//
.option("quote","\"")//
.load("file:///E://data//csvdata.csv")
df_usdata.show(false)
But it didn't preserve the double quotes inside the dataframe but it should be.
The .option("quote","\"") is not working. Am using Spark 2.3.1 version.
The output should be like below.
+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name |address |
+----------+---------+-------------------------+---------------------+
|James |Butt |"Benton, John B Jr" |6649 N Blue Gum St |
|Josephine |Darakjy |"Chanay, Jeffrey A Esq" |4 B Blue Ridge Blvd |
|Art |Venere |"Chemel, James L Cpa" |8 W Cerritos Ave #54 |
|Lenna |Paprocki |Feltz Printing Service |639 Main St |
|Donette |Foller |Printing Dimensions |34 Center St |
|Simona |Morasca |"Chapman, Ross E Esq" |3 Mcauley Dr |
+----------+---------+-------------------------+---------------------+
Regards, Dinesh Kumar
- Labels:
-
Spark--dataframe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2020 09:02 AM
When I tried with
.option("quote","")
and .option("quote","\u0000") the company_name column values got splitted into next column like below.
+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name |address |
+----------+---------+-------------------------+---------------------+
|James |Butt |"Benton | John B Jr" |
|Josephine |Darakjy |"Chanay | Jeffrey A Esq" |
|Art |Venere |"Chemel | James L Cpa" |
|Lenna |Paprocki |Feltz Printing Service |639 Main St |
|Donette |Foller |Printing Dimensions |34 Center St |
|Simona |Morasca |"Chapman | Ross E Esq" |
+----------+---------+-------------------------+---------------------+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-06-2021 02:16 AM
Try using both of these options :
.option("quote", "\"") .option("escape", "\"")- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2022 02:29 AM
Thanks, it resolves my issue with the csv generation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2022 10:15 PM
Hai Currently,I am also facing same issue,please let me know how this issue resolved.
Thanks,
Munni
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-10-2023 12:08 PM
Hi Team,
I am also facing same issue and i have applied all the option mentioned from above posts:
I will just post my dataset here:
Attached is the my input data with 3 different column out of which comment column contains text value with double quotes and commas and to read this dataset i ave used all escape options but still comment column's data is moving to third column.
Below is the dataset from csv after performing read:
Could you please help on this issue ASAP.