cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can we save a data frame in Docx format using pyspark?

rammy
Contributor III

  

I am trying to save a data frame into a document but it returns saying that the below error

java.lang.ClassNotFoundException: Failed to find data source: docx. Please find packages at http://spark.apache.org/third-party-projects.htm  

           #f_data is my dataframe with data
           f_data.write.format("docx").save("dbfs:/FileStore/test/test.csv")
           display(f_data)
 

Note that i could save files of CSV, text and JSON format but is there any way to save a docx file using pyspark?

2 REPLIES 2

Harun
Honored Contributor

Hi @Ramesh Bathini​ 

Only the below file formats are supported

  • text
  • csv
  • ldap
  • json
  • parquet
  • orc

Source Code for DataframeWriter:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWr...

jose_gonzalez
Moderator
Moderator

Hi,

You cannot do it from Pyspark, but you can try to use Pandas to save to Excell. There is no Docx

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.