ck7007
Contributor II

@kmodelew The issue is that your volume path is missing the leading forward slash. Also, the library isn't loading properly.

Quick Fix

# Correct path format for Volumes
location = '/Volumes/catalog/schema/volume_name/file.xlsx' ## Note the leading

# For DBR 16.4, use pandas instead (built-in, no library needed)
import pandas as pd

df_pandas = pd.read_excel(f"/Volumes/{catalog}/{schema}/{volume}/file.xlsx")
df = spark.createDataFrame(df_pandas)

If You Must Use spark-excel

  1. Install the library correctly: # In notebook cell
    %pip install spark-excel
    dbutils.library.restartPython()
    2. Then read with the proper path:
    location = '/Volumes/catalog/schema/volume/file.xlsx'
    df = (spark.read.format("com.crealytics.spark.excel")
    .option("header", "true")
    .option("inferSchema", "true")
    .option("dataAddress", "'Sheet1'!A1") # Specify sheet if needed
    .load(location))
    3. Alternative: Native Approach (Most Reliable)

    # Using openpyxl directly for better control
    import openpyxl
    from pyspark.sql import Row

    file_path = "/Volumes/catalog/schema/volume/file.xlsx"
    workbook = openpyxl.load_workbook(file_path, read_only=True)
    sheet = workbook.active

    # Convert to Spark DataFrame
    rows = []
    headers = [cell.value for cell in sheet[1]]
    for row in sheet.iter_rows(min_row=2, values_only=True):
    rows.append(Row(**dict(zip(headers, row))))

    df = spark.createDataFrame(rows)

    Security Note: Volumes are already secure with ACL controls. Your approach is correct for confidential data.

    Which catalog/schema/volume are you using? The exact path structure matters.