Re: Unable to read excel file from Volume

ck7007 · ‎09-03-2025

@kmodelew The issue is that your volume path is missing the leading forward slash. Also, the library isn't loading properly.

Quick Fix

# Correct path format for Volumes
location = '/Volumes/catalog/schema/volume_name/file.xlsx' ## Note the leading

# For DBR 16.4, use pandas instead (built-in, no library needed)
import pandas as pd

df_pandas = pd.read_excel(f"/Volumes/{catalog}/{schema}/{volume}/file.xlsx")
df = spark.createDataFrame(df_pandas)

Install the library correctly: # In notebook cell
%pip install spark-excel
dbutils.library.restartPython()
2. Then read with the proper path:
location = '/Volumes/catalog/schema/volume/file.xlsx'
df = (spark.read.format("com.crealytics.spark.excel")
.option("header", "true")
.option("inferSchema", "true")
.option("dataAddress", "'Sheet1'!A1") # Specify sheet if needed
.load(location))
3. Alternative: Native Approach (Most Reliable)
# Using openpyxl directly for better control
import openpyxl
from pyspark.sql import Row
file_path = "/Volumes/catalog/schema/volume/file.xlsx"
workbook = openpyxl.load_workbook(file_path, read_only=True)
sheet = workbook.active
# Convert to Spark DataFrame
rows = []
headers = [cell.value for cell in sheet[1]]
for row in sheet.iter_rows(min_row=2, values_only=True):
rows.append(Row(**dict(zip(headers, row))))
df = spark.createDataFrame(rows)
Security Note: Volumes are already secure with ACL controls. Your approach is correct for confidential data.
Which catalog/schema/volume are you using? The exact path structure matters.