cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Generating Multiple Excels for SQL Query

libpekin
New Contributor

Hello,

I am getting "OSError: Errno 95: Operation not supported for the code below. I have 'openpyxl 3.1.5' installed on the cluster and have imported all required modules. I am sure this is something small, but I can't put my finger on it why this is erroring out. Thanks for taking a look!

 

 

df = spark.sql(f"""
    SELECT
        custId,
        locId
    FROM salesorders
    WHERE year = '2025'
    LIMIT 100
""")

df_region = spark.sql(f"""
    SELECT DISTINCT Id
    FROM sales.region
    WHERE isactive = 1 AND year = '2025'
""")

for row in df_region.rdd.collect():
    regId = row.__getitem__('Id')
    df_filtered = df.where(df.locId == regId)
    df_filtered.toPandas().to_excel(filePath.format(locId=regId), index=False, engine='openpyxl')

 

 

 

1 REPLY 1

Renu_
New Contributor III

Hi @libpekin, is filepath pointing to a DBFS path? Because writing directly to DBFS paths using to_excel() is not supported due to DBFS limitations with certain file operations, especially random writes.

As a workaround, first save the Excel file to a local path on the driver node, using to_excel(), then use dbutils.fs.cp() to move it to DBFS.

Let me know if this doesn’t work.