cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Not able to generate Excel file in blob through databricks

Databricks143
New Contributor III

Hi team,.

I am using cluster 9.1 8n databricks not able to generate Excel file in blob  and  below are conf

 

Cluster:9.1.8S

park version -3.1.1

Scala version 3.1.1

Library:

Com.crealytics

Spark.excel_2.12

Version-3.1.1_0.18.2

Dependency:

Org.apachr.poi-poi-5.25

Error:Databricks.Errir:No Class DefFoundError

Org/Apache/commons/io/output/unsynchronized ByteArrayOutputstream

Causedby:class not found exception 

Org.aoche.commons.io.output.UnsynchronizedByteArrayoutputstrea at .com.crealytics.spark.excel.excefikesaver.save(Excelfilesaver.scala:61)

 

 

Request you please check and let me know in this

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @Databricks143, It appears that you’re encountering an issue while trying to generate an Excel file in Azure Databricks. 

 

Let’s troubleshoot this step by step:

 

Library Dependencies:

  • Ensure that the necessary libraries are correctly installed in your Databricks cluster. Specifically, you mentioned using the com.crealytics:spark-excel_2.12:0.18.2 library. Confirm that it’s available and properly configured.
  • Additionally, make sure the Apache POI library (org.apache.poi:poi-5.25) is accessible. This library is essential for handling Excel files.

Class Not Found Exception:

  • The error message indicates a class not found exception related to UnsynchronizedByteArrayOutputStream.
  • Verify that the required classes are available in your environment. If not, consider adding the necessary dependencies.

Spark Configuration:

  • To read Excel files using Spark, you need to configure the cluster appropriately.
  • Ensure that you’ve added the com.crealytics:spark-excel_2.12:0.18.2 package to your cluster. You can do this via Maven or directly in the Databricks UI.
  • Set the following Spark configuration to use the Excel package:spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account_name>.dfs.core.windows.net", "<account_key>") Replace <account_name> and <account_key> with your Azure storage account details.

Reading Excel Files:

  • Once the setup is correct, you can read Excel files using Spark. 
  • Here’s an example:
  • df = sqlContext.read.format("com.crealytics.spark.excel") \    .option("header", "true") \    .load("abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path_to_excel_file>") df.show()
  • Replace <container_name>, <storage_account_name>, and <path_to_excel_file> with your specific details.

Remember to adjust the paths and credentials according to your environment. If you encounter any further issues, feel free to ask for additional assistance! 😊

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group