<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18020#M11901</link>
    <description>&lt;P&gt;Following approach  can be taken - &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Replace your delimiter from comma to something else like pipe , semicolon&lt;/LI&gt;&lt;LI&gt;Provide escapeQuote option as true when you use spark.read&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Wed, 29 Jun 2022 22:19:06 GMT</pubDate>
    <dc:creator>dhara1314</dc:creator>
    <dc:date>2022-06-29T22:19:06Z</dc:date>
    <item>
      <title>Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18017#M11898</link>
      <description>&lt;P&gt;I have data like, below and when reading as CSV, I don't want to consider comma when its within the quotes even if the quotes are not immediate to the separator (like record #2). 1 and 3 records are good if we use separator, but failing on 2nd record&lt;/P&gt;&lt;P&gt;Input:&lt;/P&gt;&lt;P&gt;col1, col2, col3&lt;/P&gt;&lt;P&gt;a, b, c&lt;/P&gt;&lt;P&gt;a, b1 "b2, b3" b4, c&lt;/P&gt;&lt;P&gt;"a1, a2", b, c&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Output:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Input and expected Output"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1801i856474D3759AE2AD/image-size/large?v=v2&amp;amp;px=999" role="button" title="Input and expected Output" alt="Input and expected Output" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2022 21:39:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18017#M11898</guid>
      <dc:creator>ASN</dc:creator>
      <dc:date>2022-06-09T21:39:24Z</dc:date>
    </item>
    <item>
      <title>Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18018#M11899</link>
      <description>&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option" target="test_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Escape quotes is the config you're looking for.  &lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2022 23:39:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18018#M11899</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-06-09T23:39:01Z</dc:date>
    </item>
    <item>
      <title>Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18019#M11900</link>
      <description>&lt;P&gt;Hi Joseph... I tried but &lt;B&gt;a, b1 "b2, b3" b4, c&lt;/B&gt; row needs to convert to 3 columns as below (Expected output), but b series data are divided into 2 columns instead of single column - requirement is to ignore the comma inside quotes in 2nd column.&lt;/P&gt;&lt;P&gt;Expected output:&lt;/P&gt;&lt;P&gt;1) a&lt;/P&gt;&lt;P&gt;2) b1 "b2, b3" b4&lt;/P&gt;&lt;P&gt;3) c&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Actual output:&lt;/P&gt;&lt;P&gt;1) a&lt;/P&gt;&lt;P&gt;2) b1 "b2&lt;/P&gt;&lt;P&gt;3) b3" b4&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Satya&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jun 2022 13:36:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18019#M11900</guid>
      <dc:creator>ASN</dc:creator>
      <dc:date>2022-06-10T13:36:53Z</dc:date>
    </item>
    <item>
      <title>Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18020#M11901</link>
      <description>&lt;P&gt;Following approach  can be taken - &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Replace your delimiter from comma to something else like pipe , semicolon&lt;/LI&gt;&lt;LI&gt;Provide escapeQuote option as true when you use spark.read&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Wed, 29 Jun 2022 22:19:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18020#M11901</guid>
      <dc:creator>dhara1314</dc:creator>
      <dc:date>2022-06-29T22:19:06Z</dc:date>
    </item>
    <item>
      <title>Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18021#M11902</link>
      <description>&lt;P&gt;Hi @SATYANARAYANA ALAMANDA​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2022 18:40:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18021#M11902</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-07-29T18:40:50Z</dc:date>
    </item>
    <item>
      <title>Re: Python Read csv - Don't consider comma when its within the quotes, even if the quotes are not immediate to the separator</title>
      <link>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18022#M11903</link>
      <description>&lt;P&gt;Hi, I think you can use this option for the csvReadee&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.read.options(header = True, sep = ",",  unescapedQuoteHandling = "BACK_TO_DELIMITER").csv("your_file.csv")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;especially the unescapedQuoteHandling. You can search for the other options at this link&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html" target="test_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-csv.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2022 14:06:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/python-read-csv-don-t-consider-comma-when-its-within-the-quotes/m-p/18022#M11903</guid>
      <dc:creator>Pholo</dc:creator>
      <dc:date>2022-08-04T14:06:15Z</dc:date>
    </item>
  </channel>
</rss>

