cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.read excel with formula

Braxx
Contributor II

For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.

Consider this simple data set

image.png 

The column "color" has formulas for all the cells like

=VLOOKUP(A4,C3:D5,2,0)

In cases where the formula could not be calculated it is read differently by excel and spark:

excel - #N/A

spark - =VLOOKUP(A4,C3:D5,2,0)

Here is my code:

 df= spark.read\
   .format("com.crealytics.spark.excel")\
   .option("header", "true")\
   .load(input_path + input_folder_general + "test1.xlsx")
    
 display(df)

And here is how the above dataset is read:

image.pngHow do I get #N/A instead of a formula?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

accually, the formula is underneeth all the "color" values. Red and blue are the results of a formula and are displayed correctly.The issue is in cases when the formula could not calculate the value.

Is there any way to read only the results of formulas. #N/A as #N/A. Not a formula itself?

Using regexp is risky as I have no guarantee the formula's syntax will have the same pattern.

-werners-
Esteemed Contributor III

Spark will just consume what you throw at it, it cannot interpret excel formulas etc.

So the way to go is to make sure your formula always resolves.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group