<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: reading a tab separated CSV quietly drops empty rows in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58583#M31205</link>
    <description>&lt;P&gt;Yes, did not find official way to report bugs directly to Databricks, but it would be nice if some databricks engineer could open a respective ticket in an internal Jira.&lt;BR /&gt;In our case this is pretty much a show stopper for reading data exported from client SAP systems, as the exported data contains 8 header rows some of which are empty (only contain tabs).&lt;BR /&gt;What we planned to do was:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;read the first header row&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;skip the remaining 7 header rows&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;But if one of the headers is empty, it is already skipped by spark and we skip the first row of real data.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 29 Jan 2024 08:58:08 GMT</pubDate>
    <dc:creator>Martinitus</dc:creator>
    <dc:date>2024-01-29T08:58:08Z</dc:date>
    <item>
      <title>reading a tab separated CSV quietly drops empty rows</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58468#M31171</link>
      <description>&lt;P&gt;I already reported that as a Bug to the official Spark bug tracker:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-46876" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-46876&lt;/A&gt;&lt;/P&gt;&lt;P&gt;A short summary: When reading a tab separated file, that has lines that only contain of tabs, then this line will not show up in the parsed dataframe, instead the data is silently dropped.&lt;/P&gt;&lt;P&gt;I am not sure if this is a spark issue or only occurs in databricks &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2024 11:03:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58468#M31171</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-01-26T11:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: reading a tab separated CSV quietly drops empty rows</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58474#M31175</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/85621"&gt;@Martinitus&lt;/a&gt;&amp;nbsp;, Thank you for reporting this. It looks like a potential bug. It should be addressed via the JIRA ticket.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2024 13:41:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58474#M31175</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2024-01-26T13:41:36Z</dc:date>
    </item>
    <item>
      <title>Re: reading a tab separated CSV quietly drops empty rows</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58583#M31205</link>
      <description>&lt;P&gt;Yes, did not find official way to report bugs directly to Databricks, but it would be nice if some databricks engineer could open a respective ticket in an internal Jira.&lt;BR /&gt;In our case this is pretty much a show stopper for reading data exported from client SAP systems, as the exported data contains 8 header rows some of which are empty (only contain tabs).&lt;BR /&gt;What we planned to do was:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;read the first header row&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;skip the remaining 7 header rows&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;But if one of the headers is empty, it is already skipped by spark and we skip the first row of real data.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2024 08:58:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/58583#M31205</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-01-29T08:58:08Z</dc:date>
    </item>
    <item>
      <title>Re: reading a tab separated CSV quietly drops empty rows</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/62243#M31932</link>
      <description>&lt;P&gt;Hi Databricks team. Someone has fixed the bug and opened a PR on github already a couple of weeks ago. Maybe someone can have a look at this and merge it. Its just a minor fix, but as soon as it is rolled out it will resolve some major issues / obstacles we have on our end caused by this bug. We can then drop a manual powershell script that we currently have to run on user notebooks before the data gets uploaded to the workspace (which is a big PITA to be honest)&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/pull/44946" target="_blank" rel="noopener"&gt;https://github.com/apache/spark/pull/44946&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Feb 2024 14:42:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/62243#M31932</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-02-28T14:42:04Z</dc:date>
    </item>
    <item>
      <title>Re: reading a tab separated CSV quietly drops empty rows</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/63391#M32225</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/75976"&gt;@Lakshay&lt;/a&gt;&amp;nbsp;Do you know any way to speed up the github merge/review process? The issue has a proposed fix since more than 4 weeks now, but no one seems to care...&lt;/P&gt;</description>
      <pubDate>Tue, 12 Mar 2024 13:54:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-a-tab-separated-csv-quietly-drops-empty-rows/m-p/63391#M32225</guid>
      <dc:creator>Martinitus</dc:creator>
      <dc:date>2024-03-12T13:54:14Z</dc:date>
    </item>
  </channel>
</rss>

