Re: How can we write a pandas dataframe into azure...

FerArribas · ‎01-03-2023

I'm not sure about that. When you call the function to_excel all the data is loaded into the driver (as if you were doing a collect). So, the writing is not distributed and you can have memory and performance problems as I mentioned.

Reference: https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFram...

Try writing with this library:

https://github.com/crealytics/spark-excel

Example (https://github.com/crealytics/spark-excel/issues/134#issuecomment-517696354):

df.write

.format("com.crealytics.spark.excel")

.save("test.xlsx")