<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic spark.read excel with formula in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31948#M23291</link>
    <description>&lt;P&gt;For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.&lt;/P&gt;&lt;P&gt;Consider this simple data set&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2199i10B8C5415B227FEE/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The column "color" has formulas for all the cells like&lt;/P&gt;&lt;P&gt;=VLOOKUP(A4,C3:D5,2,0)&lt;/P&gt;&lt;P&gt;In cases where the formula could not be calculated  it is read differently by excel and spark:&lt;/P&gt;&lt;P&gt;excel - #N/A&lt;/P&gt;&lt;P&gt;spark - =VLOOKUP(A4,C3:D5,2,0)&lt;/P&gt;&lt;P&gt;Here is my code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt; df= spark.read\
   .format("com.crealytics.spark.excel")\
   .option("header", "true")\
   .load(input_path + input_folder_general + "test1.xlsx")
    
 display(df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And here is how the above dataset is read:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2201i7BE5F748DEDCA522/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;How do I get #N/A instead of a formula?&lt;/P&gt;</description>
    <pubDate>Mon, 10 Jan 2022 15:07:19 GMT</pubDate>
    <dc:creator>Braxx</dc:creator>
    <dc:date>2022-01-10T15:07:19Z</dc:date>
    <item>
      <title>spark.read excel with formula</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31948#M23291</link>
      <description>&lt;P&gt;For some reason spark is not reading the data correctly from xlsx file in the column with a formula. I am reading it from a blob storage.&lt;/P&gt;&lt;P&gt;Consider this simple data set&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2199i10B8C5415B227FEE/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The column "color" has formulas for all the cells like&lt;/P&gt;&lt;P&gt;=VLOOKUP(A4,C3:D5,2,0)&lt;/P&gt;&lt;P&gt;In cases where the formula could not be calculated  it is read differently by excel and spark:&lt;/P&gt;&lt;P&gt;excel - #N/A&lt;/P&gt;&lt;P&gt;spark - =VLOOKUP(A4,C3:D5,2,0)&lt;/P&gt;&lt;P&gt;Here is my code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt; df= spark.read\
   .format("com.crealytics.spark.excel")\
   .option("header", "true")\
   .load(input_path + input_folder_general + "test1.xlsx")
    
 display(df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And here is how the above dataset is read:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2201i7BE5F748DEDCA522/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;How do I get #N/A instead of a formula?&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jan 2022 15:07:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31948#M23291</guid>
      <dc:creator>Braxx</dc:creator>
      <dc:date>2022-01-10T15:07:19Z</dc:date>
    </item>
    <item>
      <title>Re: spark.read excel with formula</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31949#M23292</link>
      <description>&lt;P&gt;the formula itself isprobably what is actually stored in the excel file.&lt;/P&gt;&lt;P&gt;Excel translates this to NA.&lt;/P&gt;&lt;P&gt;I only know of setErrorCellsToFallbackValues but I doubt if this is applicable in your case here.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You could use a matching function (regexp f.e.)  to determine if a row contains actual output or a formula.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jan 2022 15:45:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31949#M23292</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-01-10T15:45:44Z</dc:date>
    </item>
    <item>
      <title>Re: spark.read excel with formula</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31951#M23294</link>
      <description>&lt;P&gt;accually, the formula is underneeth all the "color" values. Red and blue are the results of a formula and are displayed correctly.The issue is in cases when the formula could not calculate the value. &lt;/P&gt;&lt;P&gt;Is there any way to read only the results of formulas. #N/A as #N/A. Not a formula itself?&lt;/P&gt;&lt;P&gt;Using regexp is risky as I have no guarantee the formula's syntax will have the same pattern. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jan 2022 10:21:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31951#M23294</guid>
      <dc:creator>Braxx</dc:creator>
      <dc:date>2022-01-11T10:21:00Z</dc:date>
    </item>
    <item>
      <title>Re: spark.read excel with formula</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31952#M23295</link>
      <description>&lt;P&gt;Spark will just consume what you throw at it, it cannot interpret excel formulas etc.&lt;/P&gt;&lt;P&gt;So the way to go is to make sure your formula always resolves.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jan 2022 10:24:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-read-excel-with-formula/m-p/31952#M23295</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-01-11T10:24:56Z</dc:date>
    </item>
  </channel>
</rss>

