<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to prevent spark-csv from adding quotes to JSON string in dataframe in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30004#M21688</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I was able to turn that off by setting the quote option to be a single white space. The problem with this is I am not sure how you can espace strings can contain your delimiter - "," - or whatever you set that too. If you are sure none of your strings have the delimiting character, you should be fine.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;(df
  .repartition(1)
  .write
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .option("quote", " ")
  .save("/FileStore/test"))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 06 Nov 2015 18:29:25 GMT</pubDate>
    <dc:creator>vida</dc:creator>
    <dc:date>2015-11-06T18:29:25Z</dc:date>
    <item>
      <title>How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30002#M21686</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have a sql dataframe with a column that has a json string in it (e.g. {"key":"value"}). When I use spark-csv to save the dataframe it changes the field values to be "{""key"":""valule""}". Is there a way to turn that off?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Nov 2015 18:43:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30002#M21686</guid>
      <dc:creator>mlm</dc:creator>
      <dc:date>2015-11-02T18:43:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30003#M21687</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Try creating a custom schema that represents that column as a JSONObject and applying that schema when you create the DataFrame&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 18:25:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30003#M21687</guid>
      <dc:creator>PohlPosition</dc:creator>
      <dc:date>2015-11-06T18:25:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30004#M21688</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I was able to turn that off by setting the quote option to be a single white space. The problem with this is I am not sure how you can espace strings can contain your delimiter - "," - or whatever you set that too. If you are sure none of your strings have the delimiting character, you should be fine.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;(df
  .repartition(1)
  .write
  .format("com.databricks.spark.csv")
  .option("header", "true")
  .option("quote", " ")
  .save("/FileStore/test"))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 18:29:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30004#M21688</guid>
      <dc:creator>vida</dc:creator>
      <dc:date>2015-11-06T18:29:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30005#M21689</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Yes. The way to turn off the default escaping of the double quote character (") with the backslash character (\), you must add an .option() method call with just the right parameters after the .write() method call. The goal of the option() method call is to change how the csv() method "finds" instances of the "quote" character. To do this, you must change the default of what a "quote" actually means; i.e. change the character sought from being a double quote character (") to a Unicode "\u0000" character (essentially providing the &lt;A target="_blank" href="https://"&gt;Unicode NUL character &lt;/A&gt;which won't ever occur within a well formed JSON document).&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val dataFrame =
  spark.sql("SELECT * FROM some_table_with_a_json_column")
val unitEmitCsv =
  dataframe
    .write
    .option("header", true)
    .option("quote", "\u0000") //magic is happening here
    .csv("/FileStore/temp.csv")&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This was only one of several lessons I learned attempting to work with Apache Spark and emitting .csv files. For more information and context on this, please see the blog post I wrote titled "&lt;A target="_blank" href="https://"&gt;Example Apache Spark ETL Pipeline Integrating a SaaS&lt;/A&gt;".&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 22:58:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30005#M21689</guid>
      <dc:creator>chaotic3quilibr</dc:creator>
      <dc:date>2017-03-30T22:58:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30006#M21690</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Your answer is actually not only incorrect, it causes the JSON content to become corrupt. So, while it might have solved a highly specific problem you had at the time you were doing this, it isn't a general solution. I have come up with a general solution which I cover in &lt;A target="_blank" href="https://"&gt;my own answer to this question&lt;/A&gt;.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 23:00:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30006#M21690</guid>
      <dc:creator>chaotic3quilibr</dc:creator>
      <dc:date>2017-03-30T23:00:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to prevent spark-csv from adding quotes to JSON string in dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30007#M21691</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Do quote or escape options only work with "Write" instead of "read"? Our source files contain doube quotes. We'd like to add backsplash (escape) in front each double quote before converting the values from out dataframes to json outputs. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jun 2018 18:11:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-prevent-spark-csv-from-adding-quotes-to-json-string-in/m-p/30007#M21691</guid>
      <dc:creator>AshleyPan</dc:creator>
      <dc:date>2018-06-14T18:11:44Z</dc:date>
    </item>
  </channel>
</rss>

