<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Getting Null values at the place of data which was removed manually from excel file( solved ) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34883#M25576</link>
    <description>&lt;P&gt;I was reading an excel file with one column,&lt;/P&gt;&lt;P&gt;country &lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India &lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India &lt;/P&gt;&lt;P&gt;india&lt;/P&gt;&lt;P&gt;dataframe i got from this data : df.show()&lt;/P&gt;&lt;P&gt;+-------+ |country| +-------+ |&lt;/P&gt;&lt;P&gt; india | &lt;/P&gt;&lt;P&gt;| India |&lt;/P&gt;&lt;P&gt; | india | &lt;/P&gt;&lt;P&gt;| India | &lt;/P&gt;&lt;P&gt;| india | +-------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the next step i removed last value from the excel file manually by backspace and saved the file,&lt;/P&gt;&lt;P&gt;file now :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;country&lt;/P&gt;&lt;P&gt; india &lt;/P&gt;&lt;P&gt;India&lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India&lt;/P&gt;&lt;P&gt;now when i run the same df.show(), this is what i get:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;+-------+ |country| +-------+ |&lt;/P&gt;&lt;P&gt;india |&lt;/P&gt;&lt;P&gt; |India |&lt;/P&gt;&lt;P&gt; |india | &lt;/P&gt;&lt;P&gt;|India | |&lt;/P&gt;&lt;P&gt; null | +-------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if i have removed the value why do i get a null value at its place?&lt;/P&gt;&lt;P&gt;and my code if someone needs it,&lt;/P&gt;&lt;P&gt;val spark = SparkSession&lt;/P&gt;&lt;P&gt; .builder&lt;/P&gt;&lt;P&gt; .appName("schemaTest")&lt;/P&gt;&lt;P&gt; .master("local[*]")&lt;/P&gt;&lt;P&gt; .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;val df = spark.read&lt;/P&gt;&lt;P&gt; .format("com.crealytics.spark.excel").&lt;/P&gt;&lt;P&gt; option("header", "false").&lt;/P&gt;&lt;P&gt; option("inferSchema", "true").&lt;/P&gt;&lt;P&gt; option("treatEmptyValuesAsNulls", "false").&lt;/P&gt;&lt;P&gt; option("addColorColumns", "False").&lt;/P&gt;&lt;P&gt; load("data/trimTest2.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;edit : when i was changing some value in my excel file, i was using backspace instead of delete row in excel, which made excel thing that there is a row which is blank,&lt;/P&gt;&lt;P&gt;but if you use delete row then excel deletes the complete row and spark does not read anything there.&lt;/P&gt;</description>
    <pubDate>Wed, 17 Nov 2021 15:00:21 GMT</pubDate>
    <dc:creator>sarvesh</dc:creator>
    <dc:date>2021-11-17T15:00:21Z</dc:date>
    <item>
      <title>Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34883#M25576</link>
      <description>&lt;P&gt;I was reading an excel file with one column,&lt;/P&gt;&lt;P&gt;country &lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India &lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India &lt;/P&gt;&lt;P&gt;india&lt;/P&gt;&lt;P&gt;dataframe i got from this data : df.show()&lt;/P&gt;&lt;P&gt;+-------+ |country| +-------+ |&lt;/P&gt;&lt;P&gt; india | &lt;/P&gt;&lt;P&gt;| India |&lt;/P&gt;&lt;P&gt; | india | &lt;/P&gt;&lt;P&gt;| India | &lt;/P&gt;&lt;P&gt;| india | +-------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the next step i removed last value from the excel file manually by backspace and saved the file,&lt;/P&gt;&lt;P&gt;file now :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;country&lt;/P&gt;&lt;P&gt; india &lt;/P&gt;&lt;P&gt;India&lt;/P&gt;&lt;P&gt;india &lt;/P&gt;&lt;P&gt;India&lt;/P&gt;&lt;P&gt;now when i run the same df.show(), this is what i get:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;+-------+ |country| +-------+ |&lt;/P&gt;&lt;P&gt;india |&lt;/P&gt;&lt;P&gt; |India |&lt;/P&gt;&lt;P&gt; |india | &lt;/P&gt;&lt;P&gt;|India | |&lt;/P&gt;&lt;P&gt; null | +-------+&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if i have removed the value why do i get a null value at its place?&lt;/P&gt;&lt;P&gt;and my code if someone needs it,&lt;/P&gt;&lt;P&gt;val spark = SparkSession&lt;/P&gt;&lt;P&gt; .builder&lt;/P&gt;&lt;P&gt; .appName("schemaTest")&lt;/P&gt;&lt;P&gt; .master("local[*]")&lt;/P&gt;&lt;P&gt; .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;val df = spark.read&lt;/P&gt;&lt;P&gt; .format("com.crealytics.spark.excel").&lt;/P&gt;&lt;P&gt; option("header", "false").&lt;/P&gt;&lt;P&gt; option("inferSchema", "true").&lt;/P&gt;&lt;P&gt; option("treatEmptyValuesAsNulls", "false").&lt;/P&gt;&lt;P&gt; option("addColorColumns", "False").&lt;/P&gt;&lt;P&gt; load("data/trimTest2.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df.show()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;edit : when i was changing some value in my excel file, i was using backspace instead of delete row in excel, which made excel thing that there is a row which is blank,&lt;/P&gt;&lt;P&gt;but if you use delete row then excel deletes the complete row and spark does not read anything there.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Nov 2021 15:00:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34883#M25576</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-17T15:00:21Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34884#M25577</link>
      <description>&lt;P&gt;I think problem is on excel side. Personally when I got anything from excel I prefer to use tables in excel to control what I am getting. Alternatively just specify range:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;.option("dataAddress", "'My Sheet'!B3:C35")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Nov 2021 15:26:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34884#M25577</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-11-17T15:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34885#M25578</link>
      <description>&lt;P&gt;hi @sarvesh singh​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Have you try to use Koalas to read your Excel data instead of using this 3rd party library? You should be able to find many examples online on how to use Koalas to read Excel files. You can check Koalas's APIs &lt;A href="https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_excel.html" alt="https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_excel.html" target="_blank"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Nov 2021 18:59:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34885#M25578</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-17T18:59:01Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34886#M25579</link>
      <description>&lt;P&gt;Excel internally still thinks the cell exists but with no content, hence the null.&lt;/P&gt;&lt;P&gt;Jose's and Hubert's answers are both viable options. Or use a text format (csv f.e.) which you can open in an ordinary text editor if that is an option of course.&lt;/P&gt;&lt;P&gt;I try to avoid receiving excel files, do not like it at all.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Nov 2021 08:54:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34886#M25579</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-18T08:54:58Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34887#M25580</link>
      <description>&lt;P&gt;Hi , well i work with scala and I tried but couldn't find koalas for scala.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 04:20:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34887#M25580</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-19T04:20:05Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34888#M25581</link>
      <description>&lt;P&gt;I have to handle an excel file in m project, i understand where this problem is coming from but reading a sheet manually is not a dynamic solution for me.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 04:21:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34888#M25581</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-19T04:21:24Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34889#M25582</link>
      <description>&lt;P&gt;yes the problem is excel, i think there is no sure way other then reading it manually might go with it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 04:22:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34889#M25582</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-19T04:22:22Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34890#M25583</link>
      <description>&lt;P&gt;I got the issue, i was using backspace to delete a value instead of delete rows command in excel.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 04:35:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34890#M25583</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-19T04:35:25Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34891#M25584</link>
      <description>&lt;P&gt;@sarvesh singh​&amp;nbsp;- Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?&lt;/P&gt;</description>
      <pubDate>Fri, 19 Nov 2021 16:52:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34891#M25584</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-19T16:52:28Z</dc:date>
    </item>
    <item>
      <title>Re: Getting Null values at the place of data which was removed manually from excel file( solved )</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34892#M25585</link>
      <description>&lt;P&gt;well i figured it out on my own, that is why i mentioned it my comment has the solution for anyone who needs it.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 08:56:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-null-values-at-the-place-of-data-which-was-removed/m-p/34892#M25585</guid>
      <dc:creator>sarvesh</dc:creator>
      <dc:date>2021-11-22T08:56:46Z</dc:date>
    </item>
  </channel>
</rss>

