bianca_unifeye
Databricks MVP

Databricks’ built-in Excel reader is Beta and is documented to ingest evaluated formulas (i.e., computed results). https://docs.databricks.com/aws/en/query/formats/excel

In practice, Excel files store both the formula and (optionally) a cached calculated result. Many readers (via Apache POI under the hood) do not recalculate formulas, they read the cached result if it exists, otherwise they may fall back to returning the formula string.

Why it can look “inconsistent” even if S3 file didn’t change

  • The most common reason is actually a change in runtime/connector version or execution environment (e.g., different DBR, different cluster policy, preview/feature rollout), which can change the fallback behaviour in a Beta feature. 

How to fix / mitigate

  1. Verify DBR version & cluster is identical between runs (Excel support requires DBR 17.1+)

  2. Ensure the workbook has fresh cached results:

    • Open the file in Excel (or equivalent), force recalculation, then Save (this writes cached results).

    • If you can’t rely on cached results, export to CSV (values-only) upstream.

  3. If you must evaluate formulas server-side, you’ll need a different approach (e.g., pre-processing outside Spark), because Spark/Databricks isn’t an Excel calculation engine.