Databricks Community

Braxx · ‎01-10-2022

For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.

Consider this simple data set

The column "color" has formulas for all the cells like

=VLOOKUP(A4,C3:D5,2,0)

In cases where the formula could not be calculated it is read differently by excel and spark:

excel - #N/A

spark - =VLOOKUP(A4,C3:D5,2,0)

Here is my code:

 df= spark.read\
   .format("com.crealytics.spark.excel")\
   .option("header", "true")\
   .load(input_path + input_folder_general + "test1.xlsx")
    
 display(df)

And here is how the above dataset is read:

How do I get #N/A instead of a formula?

-werners- · ‎01-10-2022

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

View solution in original post

-werners- · ‎01-10-2022

the formula itself isprobably what is actually stored in the excel file.

Excel translates this to NA.

I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.

You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.

Braxx · ‎01-11-2022

accually, the formula is underneeth all the "color" values. Red and blue are the results of a formula and are displayed correctly.The issue is in cases when the formula could not calculate the value.

Is there any way to read only the results of formulas. #N/A as #N/A. Not a formula itself?

Using regexp is risky as I have no guarantee the formula's syntax will have the same pattern.

-werners- · ‎01-11-2022

Spark will just consume what you throw at it, it cannot interpret excel formulas etc.

So the way to go is to make sure your formula always resolves.

Databricks Community

spark.read excel with formula

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs