Re: Unable to read excel file from Volume

ck7007 · ‎09-04-2025

The Actual Working Solution

The pandas approach works (as @kmodelew confirmed), but here's the complete, tested solution for reading Excel from Volumes:

# For Excel files in Unity Catalog Volumes
import pandas as pd

# Correct path format (with leading /)
file_path = '/Volumes/catalog/schema/volume/file.xlsx'

# Read and convert to Spark
df_pandas = pd.read_excel(file_path)
df = spark.createDataFrame(df_pandas)

For the Crealytics Library (if needed)

The Crealytics spark-excel library requires Maven installation, not pip:
# Install via cluster libraries UI or init script:
# Maven coordinates: com.crealytics:spark-excel_2.13:3.5.1_0.20.4

@bs_THE_ANALYSTThanks for catching the path issue—yes, the leading / is critical for volumes!

@kmodelewGreat that pandas worked! For the PySparkTypeError on cluster 1, it's likely a schema inference issue. You can fix it with:
df = spark.createDataFrame(df_pandas.astype(str)) ##Convert all to string first.

You're all right about LLM verification—I should have tested the pip install command before suggesting it. Lesson learned. The pandas approach is simpler and built-in anyway.

Thanks for keeping the community accurate! 👍