<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Escape Backslash(/) while writing spark dataframe into csv in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27209#M19089</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 17 Apr 2020 21:19:09 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2020-04-17T21:19:09Z</dc:date>
    <item>
      <title>Escape Backslash(/) while writing spark dataframe into csv</title>
      <link>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27208#M19088</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am using spark version 2.4.0. I know that Backslash is default escape character in spark but still I am facing below issue. &lt;/P&gt;
&lt;P&gt;I am reading a csv file into a spark dataframe (using pyspark language) and writing back the dataframe into csv. I have some "//" in my source csv file (as mentioned below), where first Backslash represent the escape character and second Backslash is the actual value. &lt;/P&gt;
&lt;P&gt;Test.csv (Source Data) &lt;/P&gt;
&lt;P&gt;Col1,Col2,Col3,Col4 &lt;/P&gt;
&lt;P&gt;1,"abc//",xyz,Val2 &lt;/P&gt;
&lt;P&gt;2,"//",abc,Val2 &lt;/P&gt;
&lt;P&gt;I am reading the Test.csv file and creating dataframe using below piece of code: &lt;/P&gt;
&lt;P&gt;df = sqlContext.read.format('com.databricks.spark.csv').schema(schema).option("escape", "\\").options(header='true').load("Test.csv") &lt;/P&gt;
&lt;P&gt;And reading the df dataframe and writing back to Output.csv file using below code: df.repartition(1).write.format('csv').option("emptyValue", empty).option("header", "false").option("escape", "\\").option("path", 'D:\TestCode\Output.csv').save(header = 'true') &lt;/P&gt;
&lt;P&gt;Output.csv &lt;/P&gt;
&lt;P&gt;Col1,Col2,Col3,Col4 &lt;/P&gt;
&lt;P&gt;1,"abc//",xyz,Val2 &lt;/P&gt;
&lt;P&gt;2,/,abc,Val2 &lt;/P&gt;
&lt;P&gt;In 2nd row of Output.csv, escape character is getting lost along with the quotes(""). My requirement is to retain the escape character in output.csv as well. &lt;/P&gt;
&lt;P&gt;Any kind of help will be much appreciated. Thanks in advance&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Apr 2020 12:32:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27208#M19088</guid>
      <dc:creator>HarisKhan</dc:creator>
      <dc:date>2020-04-12T12:32:03Z</dc:date>
    </item>
    <item>
      <title>Re: Escape Backslash(/) while writing spark dataframe into csv</title>
      <link>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27209#M19089</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2020 21:19:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27209#M19089</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2020-04-17T21:19:09Z</dc:date>
    </item>
    <item>
      <title>Re: Escape Backslash(/) while writing spark dataframe into csv</title>
      <link>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27210#M19090</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;when I write my databricks output to cloud via python, when reading into Power BI, I get extra '\' - how do I eliminate the extra slashes? I seem to get them in null columns '\\' and an extra one in the NTID field eg Company\\NtId (extra ). I don't want to remove them all, just in null fields and the extra one described above. Help!&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jan 2021 16:13:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/escape-backslash-while-writing-spark-dataframe-into-csv/m-p/27210#M19090</guid>
      <dc:creator>Granilpa</dc:creator>
      <dc:date>2021-01-05T16:13:20Z</dc:date>
    </item>
  </channel>
</rss>

