cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator

ASN
New Contributor II

I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2). 1 and 3 records are good if we use separator, but failing on 2nd record

Input:

col1, col2, col3

a, b, c

a, b1 "b2, b3" b4, c

"a1, a2", b, c

Output:

Input and expected Output 

5 REPLIES 5

Anonymous
Not applicable

ASN
New Contributor II

Hi Joseph... I tried but a, b1 "b2, b3" b4, c row needs to convert to 3 columns as below (Expected output), but b series data are divided into 2 columns instead of single column - requirement is to ignore the comma inside quotes in 2nd column.

Expected output:

1) a

2) b1 "b2, b3" b4

3) c

Actual output:

1) a

2) b1 "b2

3) b3" b4

Thanks,

Satya

dhara1314
New Contributor II

Following approach can be taken -

  1. Replace your delimiter from comma to something else like pipe , semicolon
  2. Provide escapeQuote option as true when you use spark.read

Hi @SATYANARAYANA ALAMANDA​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Pholo
Contributor

Hi, I think you can use this option for the csvReadee

spark.read.options(header = True, sep = ",",  unescapedQuoteHandling = "BACK_TO_DELIMITER").csv("your_file.csv")

especially the unescapedQuoteHandling. You can search for the other options at this link

https://spark.apache.org/docs/latest/sql-data-sources-csv.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group