data.write.format('com.databricks.spark.csv') adde...

WenLin · ‎06-06-2016

0favorite

I am using the following code (pyspark) to export my data frame to csv:

data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')

Note that I use

delimiter="\t"

, as I don't want to add additional quotation marks around each field. However, when I checked the output csv file, there are still some fields which are enclosed by quotation marks. e.g.

abcdABCDAAbbcd ....

1234_3456ABCD  ...

"-12345678AbCd"...

It seems that the quotation mark appears when the leading character of a field is "-". Why is this happening and is there a way to avoid this? Thanks!

data.write.format('com.databricks.spark.csv') added additional quotation marks