cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Handle comma inside cell of CSV

AnandJ_Kadhi
New Contributor II

We are using spark-csv_2.10 > version 1.5.0

and reading the csv file column which contains comma " , " as one of the character. The problem we are facing is like that it treats the rest of line after the comma as new column and data is not interpreted properly due to that.

Can you please suggest any solution over the same ?

2 REPLIES 2

osamakhn
New Contributor II

I have been solving this with a pandas intermediary function but spark solution would be helpful! I am willing to contribute as well if anyone can point me in the right direction

User16857282152
Contributor

Take a look here for options,

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframereader#pyspark.sq...

If a csv file has commas then the tradition is to quote the string that contains the comma,

In particular see if adding some of the options from that documentation such as.

quote – sets a single character used for escaping quoted values where the separator can be part of the value. If None is set, it uses the default value,

"
. If you would like to turn off quotations, you need to set an empty string.

Also,

You may have poorly formatted data, in that case you might need to read the whole line as a string and then parse as a dataframe with single column and use tools to split the string to create the needed final dataframe

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.