<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CSV Reader reads quoted fields inconsistently in last column in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59262#M31354</link>
    <description>&lt;P&gt;Not providing the escape option would default to "\" which I do not want.&lt;/P&gt;&lt;P&gt;Also, if I provide an invalid option, then I expect an error when doing so, not corrupted data.&lt;/P&gt;</description>
    <pubDate>Mon, 05 Feb 2024 08:50:07 GMT</pubDate>
    <dc:creator>Martinitus</dc:creator>
    <dc:date>2024-02-05T08:50:07Z</dc:date>
    <item>
      <title>CSV Reader reads quoted fields inconsistently in last column</title>
      <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59051#M31323</link>
      <description>&lt;P&gt;I just opened another issue: &lt;A href="https://issues.apache.org/jira/browse/SPARK-46959" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-46959&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It corrupts data even when read with mode="FAILFAST", i consider it critical, because basic stuff like this&amp;nbsp; should just work!&lt;/P&gt;</description>
      <pubDate>Fri, 02 Feb 2024 12:19:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59051#M31323</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-02-02T12:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: CSV Reader reads quoted fields inconsistently in last column</title>
      <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59099#M31337</link>
      <description>&lt;P&gt;You are using the escape option incorrectly&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = (spark.read
  .format("csv")
  .option("header","true")
  .option("sep",";")
  .option("encoding","ISO-8859-1")
  .option("lineSep","\r\n")
  .option("nullValue","")
  .option("quote",'"')
  #.option("escape","") 
  .load("/FileStore/1.csv")
)

df.display()



------------------
a,b,c,d
10,"100,00",Some;String,ok
20,"200,00",null,still ok
30,"300,00",also ok,null
40,"400,00",null,null&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-csv.html" target="_blank"&gt;CSV Files - Spark 3.5.0 Documentation (apache.org)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Feb 2024 04:51:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59099#M31337</guid>
      <dc:creator>feiyun0112</dc:creator>
      <dc:date>2024-02-03T04:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: CSV Reader reads quoted fields inconsistently in last column</title>
      <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59262#M31354</link>
      <description>&lt;P&gt;Not providing the escape option would default to "\" which I do not want.&lt;/P&gt;&lt;P&gt;Also, if I provide an invalid option, then I expect an error when doing so, not corrupted data.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Feb 2024 08:50:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59262#M31354</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-02-05T08:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: CSV Reader reads quoted fields inconsistently in last column</title>
      <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59266#M31355</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/85621"&gt;@Martinitus&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;Not providing the escape option would default to "\" which I do not want.&lt;/P&gt;&lt;P&gt;Also, if I provide an invalid option, then I expect an error when doing so, not corrupted data.&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;if no&amp;nbsp;&lt;SPAN&gt;escape&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;option, how to convert this string:&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;"some text";some text";some text"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Feb 2024 09:17:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59266#M31355</guid>
      <dc:creator>feiyun0112</dc:creator>
      <dc:date>2024-02-05T09:17:45Z</dc:date>
    </item>
    <item>
      <title>Re: CSV Reader reads quoted fields inconsistently in last column</title>
      <link>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59278#M31359</link>
      <description>&lt;P&gt;either:&amp;nbsp;&amp;nbsp;[ 'some text', 'some text"', 'some text"' ]&lt;/P&gt;&lt;P&gt;alternatively: [ '"some text"', 'some text"', 'some text"' ]&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;probably most sane behavior would be a parser error ( with mode="FAILFAST").&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;just parsing garbage without warning the user is certainly not a viable option.&lt;/P&gt;&lt;P&gt;I am well aware of the problems with CSV formats in general, it turns out I spend a significant amount of my working time dealing with those issues. Spark is a tool that should make this easier for me, not more difficult &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Feb 2024 11:59:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/csv-reader-reads-quoted-fields-inconsistently-in-last-column/m-p/59278#M31359</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-02-05T11:59:24Z</dc:date>
    </item>
  </channel>
</rss>

