cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Not able to generate Excel file in blob through databricks

Databricks143
New Contributor III

Hi team,.

I am using cluster 9.1 8n databricks not able to generate Excel file in blob  and  below are conf

 

Cluster:9.1.8S

park version -3.1.1

Scala version 3.1.1

Library:

Com.crealytics

Spark.excel_2.12

Version-3.1.1_0.18.2

Dependency:

Org.apachr.poi-poi-5.25

Error:Databricks.Errir:No Class DefFoundError

Org/Apache/commons/io/output/unsynchronized ByteArrayOutputstream

Causedby:class not found exception 

Org.aoche.commons.io.output.UnsynchronizedByteArrayoutputstrea at .com.crealytics.spark.excel.excefikesaver.save(Excelfilesaver.scala:61)

 

 

Request you please check and let me know in this

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Databricks143, It appears that youโ€™re encountering an issue while trying to generate an Excel file in Azure Databricks. 

 

Letโ€™s troubleshoot this step by step:

 

Library Dependencies:

  • Ensure that the necessary libraries are correctly installed in your Databricks cluster. Specifically, you mentioned using the com.crealytics:spark-excel_2.12:0.18.2 library. Confirm that itโ€™s available and properly configured.
  • Additionally, make sure the Apache POI library (org.apache.poi:poi-5.25) is accessible. This library is essential for handling Excel files.

Class Not Found Exception:

  • The error message indicates a class not found exception related to UnsynchronizedByteArrayOutputStream.
  • Verify that the required classes are available in your environment. If not, consider adding the necessary dependencies.

Spark Configuration:

  • To read Excel files using Spark, you need to configure the cluster appropriately.
  • Ensure that youโ€™ve added the com.crealytics:spark-excel_2.12:0.18.2 package to your cluster. You can do this via Maven or directly in the Databricks UI.
  • Set the following Spark configuration to use the Excel package:spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account_name>.dfs.core.windows.net", "<account_key>") Replace <account_name> and <account_key> with your Azure storage account details.

Reading Excel Files:

  • Once the setup is correct, you can read Excel files using Spark. 
  • Hereโ€™s an example:
  • df = sqlContext.read.format("com.crealytics.spark.excel") \    .option("header", "true") \    .load("abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path_to_excel_file>") df.show()
  • Replace <container_name>, <storage_account_name>, and <path_to_excel_file> with your specific details.

Remember to adjust the paths and credentials according to your environment. If you encounter any further issues, feel free to ask for additional assistance! ๐Ÿ˜Š

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.