cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting Null values at the place of data which was removed manually from excel file( solved )

sarvesh
Contributor III

I was reading an excel file with one column,

country

india

India

india

India

india

dataframe i got from this data : df.show()

+-------+ |country| +-------+ |

india |

| India |

| india |

| India |

| india | +-------+

In the next step i removed last value from the excel file manually by backspace and saved the file,

file now :

country

india

India

india

India

now when i run the same df.show(), this is what i get:

+-------+ |country| +-------+ |

india |

|India |

|india |

|India | |

null | +-------+

if i have removed the value why do i get a null value at its place?

and my code if someone needs it,

val spark = SparkSession

.builder

.appName("schemaTest")

.master("local[*]")

.getOrCreate()

val df = spark.read

.format("com.crealytics.spark.excel").

option("header", "false").

option("inferSchema", "true").

option("treatEmptyValuesAsNulls", "false").

option("addColorColumns", "False").

load("data/trimTest2.xlsx")

df.show()

edit : when i was changing some value in my excel file, i was using backspace instead of delete row in excel, which made excel thing that there is a row which is blank,

but if you use delete row then excel deletes the complete row and spark does not read anything there.

1 ACCEPTED SOLUTION

Accepted Solutions

sarvesh
Contributor III

I got the issue, i was using backspace to delete a value instead of delete rows command in excel.

View solution in original post

9 REPLIES 9

Hubert-Dudek
Esteemed Contributor III

I think problem is on excel side. Personally when I got anything from excel I prefer to use tables in excel to control what I am getting. Alternatively just specify range:

.option("dataAddress", "'My Sheet'!B3:C35")

yes the problem is excel, i think there is no sure way other then reading it manually might go with it.

jose_gonzalez
Databricks Employee
Databricks Employee

hi @sarvesh singh​ ,

Have you try to use Koalas to read your Excel data instead of using this 3rd party library? You should be able to find many examples online on how to use Koalas to read Excel files. You can check Koalas's APIs here

Hi , well i work with scala and I tried but couldn't find koalas for scala.

-werners-
Esteemed Contributor III

Excel internally still thinks the cell exists but with no content, hence the null.

Jose's and Hubert's answers are both viable options. Or use a text format (csv f.e.) which you can open in an ordinary text editor if that is an option of course.

I try to avoid receiving excel files, do not like it at all.

I have to handle an excel file in m project, i understand where this problem is coming from but reading a sheet manually is not a dynamic solution for me.

sarvesh
Contributor III

I got the issue, i was using backspace to delete a value instead of delete rows command in excel.

Anonymous
Not applicable

@sarvesh singh​ - Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?

well i figured it out on my own, that is why i mentioned it my comment has the solution for anyone who needs it.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group