- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-04-2025 03:04 PM
@szymon_dybczak @BS_THE_ANALYST @TheOC:
The Actual Working Solution
The pandas approach works (as @kmodelew confirmed), but here's the complete, tested solution for reading Excel from Volumes:
# For Excel files in Unity Catalog Volumes
import pandas as pd
# Correct path format (with leading /)
file_path = '/Volumes/catalog/schema/volume/file.xlsx'
# Read and convert to Spark
df_pandas = pd.read_excel(file_path)
df = spark.createDataFrame(df_pandas)
For the Crealytics Library (if needed)
The Crealytics spark-excel library requires Maven installation, not pip:
# Install via cluster libraries UI or init script:
# Maven coordinates: com.crealytics:spark-excel_2.13:3.5.1_0.20.4
@bs_THE_ANALYSTThanks for catching the path issue—yes, the leading / is critical for volumes!
@kmodelewGreat that pandas worked! For the PySparkTypeError on cluster 1, it's likely a schema inference issue. You can fix it with:
df = spark.createDataFrame(df_pandas.astype(str)) ##Convert all to string first.
You're all right about LLM verification—I should have tested the pip install command before suggesting it. Lesson learned. The pandas approach is simpler and built-in anyway.
Thanks for keeping the community accurate! 👍