โ08-24-2020 09:52 AM
Hi , I am trying to read a csv file with one column has double quotes like below.
James,Butt,"Benton, John B Jr",6649 N Blue Gum St
Josephine,Darakjy,"Chanay, Jeffrey A Esq",4 B Blue Ridge Blvd
Art,Venere,"Chemel, James L Cpa",8 W Cerritos Ave #54
Lenna,Paprocki,Feltz Printing Service,639 Main St,Anchorage
Donette,Foller,Printing Dimensions,34 Center St,Hamilton
Simona,Morasca,"Chapman, Ross E Esq",3 Mcauley Dr
I am using the below code to keep the double quotes as its from the csv file.(few rows having double quotes and few dont)
val df_usdata = spark.read.format("com.databricks.spark.csv")//
.option("header","true")//
.option("quote","\"")//
.load("file:///E://data//csvdata.csv")
df_usdata.show(false)
But it didn't preserve the double quotes inside the dataframe but it should be.
The .option("quote","\"") is not working. Am using Spark 2.3.1 version.
The output should be like below.
+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name |address |
+----------+---------+-------------------------+---------------------+
|James |Butt |"Benton, John B Jr" |6649 N Blue Gum St |
|Josephine |Darakjy |"Chanay, Jeffrey A Esq" |4 B Blue Ridge Blvd |
|Art |Venere |"Chemel, James L Cpa" |8 W Cerritos Ave #54 |
|Lenna |Paprocki |Feltz Printing Service |639 Main St |
|Donette |Foller |Printing Dimensions |34 Center St |
|Simona |Morasca |"Chapman, Ross E Esq" |3 Mcauley Dr |
+----------+---------+-------------------------+---------------------+
Regards, Dinesh Kumar
โ08-25-2020 09:02 AM
When I tried with
.option("quote","")
and .option("quote","\u0000") the company_name column values got splitted into next column like below.
+----------+---------+-------------------------+---------------------+
|first_name|last_name|company_name |address |
+----------+---------+-------------------------+---------------------+
|James |Butt |"Benton | John B Jr" |
|Josephine |Darakjy |"Chanay | Jeffrey A Esq" |
|Art |Venere |"Chemel | James L Cpa" |
|Lenna |Paprocki |Feltz Printing Service |639 Main St |
|Donette |Foller |Printing Dimensions |34 Center St |
|Simona |Morasca |"Chapman | Ross E Esq" |
+----------+---------+-------------------------+---------------------+
โ08-06-2021 02:16 AM
Try using both of these options :
.option("quote", "\"") .option("escape", "\"")โ01-21-2022 02:29 AM
Thanks, it resolves my issue with the csv generation
โ09-14-2022 10:15 PM
Hai Currently,I am also facing same issue,please let me know how this issue resolved.
Thanks,
Munni
โ08-10-2023 12:08 PM
Hi Team,
I am also facing same issue and i have applied all the option mentioned from above posts:
I will just post my dataset here:
Attached is the my input data with 3 different column out of which comment column contains text value with double quotes and commas and to read this dataset i ave used all escape options but still comment column's data is moving to third column.
Below is the dataset from csv after performing read:
Could you please help on this issue ASAP.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group