11-17-2021 07:00 AM
I was reading an excel file with one column,
country
india
India
india
India
india
dataframe i got from this data : df.show()
+-------+ |country| +-------+ |
india |
| India |
| india |
| India |
| india | +-------+
In the next step i removed last value from the excel file manually by backspace and saved the file,
file now :
country
india
India
india
India
now when i run the same df.show(), this is what i get:
+-------+ |country| +-------+ |
india |
|India |
|india |
|India | |
null | +-------+
if i have removed the value why do i get a null value at its place?
and my code if someone needs it,
val spark = SparkSession
.builder
.appName("schemaTest")
.master("local[*]")
.getOrCreate()
val df = spark.read
.format("com.crealytics.spark.excel").
option("header", "false").
option("inferSchema", "true").
option("treatEmptyValuesAsNulls", "false").
option("addColorColumns", "False").
load("data/trimTest2.xlsx")
df.show()
edit : when i was changing some value in my excel file, i was using backspace instead of delete row in excel, which made excel thing that there is a row which is blank,
but if you use delete row then excel deletes the complete row and spark does not read anything there.
11-18-2021 08:35 PM
I got the issue, i was using backspace to delete a value instead of delete rows command in excel.
11-17-2021 07:26 AM
I think problem is on excel side. Personally when I got anything from excel I prefer to use tables in excel to control what I am getting. Alternatively just specify range:
.option("dataAddress", "'My Sheet'!B3:C35")
11-18-2021 08:22 PM
yes the problem is excel, i think there is no sure way other then reading it manually might go with it.
11-17-2021 10:59 AM
hi @sarvesh singh ,
Have you try to use Koalas to read your Excel data instead of using this 3rd party library? You should be able to find many examples online on how to use Koalas to read Excel files. You can check Koalas's APIs here
11-18-2021 08:20 PM
Hi , well i work with scala and I tried but couldn't find koalas for scala.
11-18-2021 12:54 AM
Excel internally still thinks the cell exists but with no content, hence the null.
Jose's and Hubert's answers are both viable options. Or use a text format (csv f.e.) which you can open in an ordinary text editor if that is an option of course.
I try to avoid receiving excel files, do not like it at all.
11-18-2021 08:21 PM
I have to handle an excel file in m project, i understand where this problem is coming from but reading a sheet manually is not a dynamic solution for me.
11-18-2021 08:35 PM
I got the issue, i was using backspace to delete a value instead of delete rows command in excel.
11-19-2021 08:52 AM
@sarvesh singh - Thank you for letting us know. Would you be happy to mark the best answer so others can find the solution easily?
11-22-2021 12:56 AM
well i figured it out on my own, that is why i mentioned it my comment has the solution for anyone who needs it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group