cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Escape Backslash(/) while writing spark dataframe into csv

HarisKhan
New Contributor

I am using spark version 2.4.0. I know that Backslash is default escape character in spark but still I am facing below issue.

I am reading a csv file into a spark dataframe (using pyspark language) and writing back the dataframe into csv. I have some "//" in my source csv file (as mentioned below), where first Backslash represent the escape character and second Backslash is the actual value.

Test.csv (Source Data)

Col1,Col2,Col3,Col4

1,"abc//",xyz,Val2

2,"//",abc,Val2

I am reading the Test.csv file and creating dataframe using below piece of code:

df = sqlContext.read.format('com.databricks.spark.csv').schema(schema).option("escape", "\\").options(header='true').load("Test.csv")

And reading the df dataframe and writing back to Output.csv file using below code: df.repartition(1).write.format('csv').option("emptyValue", empty).option("header", "false").option("escape", "\\").option("path", 'D:\TestCode\Output.csv').save(header = 'true')

Output.csv

Col1,Col2,Col3,Col4

1,"abc//",xyz,Val2

2,/,abc,Val2

In 2nd row of Output.csv, escape character is getting lost along with the quotes(""). My requirement is to retain the escape character in output.csv as well.

Any kind of help will be much appreciated. Thanks in advance

2 REPLIES 2

sean_owen
Honored Contributor II
Honored Contributor II

I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?

Granilpa
New Contributor II

when I write my databricks output to cloud via python, when reading into Power BI, I get extra '\' - how do I eliminate the extra slashes? I seem to get them in null columns '\\' and an extra one in the NTID field eg Company\\NtId (extra ). I don't want to remove them all, just in null fields and the extra one described above. Help!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!