cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.read excel with formula

Braxx
Contributor II

For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.

Consider this simple data set

image.png 

The column "color" has formulas for all the cells like

=VLOOKUP(A4,C3:D5,2,0)

In cases where the formula could not be calculated it is read differently by excel and spark:

excel - #N/A

spark - =VLOOKUP(A4,C3:D5,2,0)

Here is my code:

 df= spark.read\
   .format("com.crealytics.spark.excel")\
   .option("header", "true")\
   .load(input_path + input_folder_general + "test1.xlsx")
    
 display(df)

And here is how the above dataset is read:

image.pngHow do I get #N/A instead of a formula?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

Kaniz
Community Manager
Community Manager

Hi @Bartosz Wachocki​ , Does @Werner Stinckens​ 's reply answer your question?

accually, the formula is underneeth all the "color" values. Red and blue are the results of a formula and are displayed correctly.The issue is in cases when the formula could not calculate the value.

Is there any way to read only the results of formulas. #N/A as #N/A. Not a formula itself?

Using regexp is risky as I have no guarantee the formula's syntax will have the same pattern.

-werners-
Esteemed Contributor III

Spark will just consume what you throw at it, it cannot interpret excel formulas etc.

So the way to go is to make sure your formula always resolves.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.