- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-10-2022 07:07 AM
For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.
Consider this simple data set
The column "color" has formulas for all the cells like
=VLOOKUP(A4,C3:D5,2,0)
In cases where the formula could not be calculated it is read differently by excel and spark:
excel - #N/A
spark - =VLOOKUP(A4,C3:D5,2,0)
Here is my code:
df= spark.read\
.format("com.crealytics.spark.excel")\
.option("header", "true")\
.load(input_path + input_folder_general + "test1.xlsx")
display(df)
And here is how the above dataset is read:
How do I get #N/A instead of a formula?
- Labels:
-
Pyspark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-10-2022 07:45 AM
the formula itself isprobably what is actually stored in the excel file.
Excel translates this to NA.
I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.
You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-10-2022 07:45 AM
the formula itself isprobably what is actually stored in the excel file.
Excel translates this to NA.
I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.
You could use a matching function (regexp f.e.) to determine if a row contains actual output or a formula.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2022 02:21 AM
accually, the formula is underneeth all the "color" values. Red and blue are the results of a formula and are displayed correctly.The issue is in cases when the formula could not calculate the value.
Is there any way to read only the results of formulas. #N/A as #N/A. Not a formula itself?
Using regexp is risky as I have no guarantee the formula's syntax will have the same pattern.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2022 02:24 AM
Spark will just consume what you throw at it, it cannot interpret excel formulas etc.
So the way to go is to make sure your formula always resolves.
data:image/s3,"s3://crabby-images/618ac/618ac5f2bf7746f4cdeea1aaad5a0ab2f9192c1d" alt=""
data:image/s3,"s3://crabby-images/618ac/618ac5f2bf7746f4cdeea1aaad5a0ab2f9192c1d" alt=""