Generating Multiple Excels for SQL Query

libpekin — Mon, 03 Mar 2025 17:43:14 GMT

Hello,

I am getting "OSError: Errno 95: Operation not supported for the code below. I have 'openpyxl 3.1.5' installed on the cluster and have imported all required modules. I am sure this is something small, but I can't put my finger on it why this is erroring out. Thanks for taking a look!

df = spark.sql(f""" SELECT custId, locId FROM salesorders WHERE year = '2025' LIMIT 100 """) df_region = spark.sql(f""" SELECT DISTINCT Id FROM sales.region WHERE isactive = 1 AND year = '2025' """) for row in df_region.rdd.collect(): regId = row.__getitem__('Id') df_filtered = df.where(df.locId == regId) df_filtered.toPandas().to_excel(filePath.format(locId=regId), index=False, engine='openpyxl')

Re: Generating Multiple Excels for SQL Query

Renu_ — Wed, 02 Apr 2025 19:11:44 GMT

Hi @libpekin, is filepath pointing to a DBFS path? Because writing directly to DBFS paths using to_excel() is not supported due to DBFS limitations with certain file operations, especially random writes.

As a workaround, first save the Excel file to a local path on the driver node, using to_excel(), then use dbutils.fs.cp() to move it to DBFS.

Let me know if this doesn’t work.

topic Re: Generating Multiple Excels for SQL Query in Data Engineering

Generating Multiple Excels for SQL Query

Re: Generating Multiple Excels for SQL Query