How can we save a data frame in Docx format using pyspark?

rammy
Contributor III

  

I am trying to save a data frame into a document but it returns saying that the below error

java.lang.ClassNotFoundException: Failed to find data source: docx. Please find packages at http://spark.apache.org/third-party-projects.htm  

           #f_data is my dataframe with data
           f_data.write.format("docx").save("dbfs:/FileStore/test/test.csv")
           display(f_data)
 

Note that i could save files of CSV, text and JSON format but is there any way to save a docx file using pyspark?

Harun
Honored Contributor

Hi @Ramesh Bathini​ 

Only the below file formats are supported

  • text
  • csv
  • ldap
  • json
  • parquet
  • orc

Source Code for DataframeWriter:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWr...

jose_gonzalez
Databricks Employee
Databricks Employee

Hi,

You cannot do it from Pyspark, but you can try to use Pandas to save to Excell. There is no Docx