- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2025 08:24 AM
@kmodelew The issue is that your volume path is missing the leading forward slash. Also, the library isn't loading properly.
Quick Fix
# Correct path format for Volumes
location = '/Volumes/catalog/schema/volume_name/file.xlsx' ## Note the leading
# For DBR 16.4, use pandas instead (built-in, no library needed)
import pandas as pd
df_pandas = pd.read_excel(f"/Volumes/{catalog}/{schema}/{volume}/file.xlsx")
df = spark.createDataFrame(df_pandas)
If You Must Use spark-excel
- Install the library correctly: # In notebook cell
%pip install spark-excel
dbutils.library.restartPython()
2. Then read with the proper path:
location = '/Volumes/catalog/schema/volume/file.xlsx'
df = (spark.read.format("com.crealytics.spark.excel")
.option("header", "true")
.option("inferSchema", "true")
.option("dataAddress", "'Sheet1'!A1") # Specify sheet if needed
.load(location))
3. Alternative: Native Approach (Most Reliable)# Using openpyxl directly for better control
import openpyxl
from pyspark.sql import Rowfile_path = "/Volumes/catalog/schema/volume/file.xlsx"
workbook = openpyxl.load_workbook(file_path, read_only=True)
sheet = workbook.active# Convert to Spark DataFrame
rows = []
headers = [cell.value for cell in sheet[1]]
for row in sheet.iter_rows(min_row=2, values_only=True):
rows.append(Row(**dict(zip(headers, row))))df = spark.createDataFrame(rows)
Security Note: Volumes are already secure with ACL controls. Your approach is correct for confidential data.
Which catalog/schema/volume are you using? The exact path structure matters.