Databricks Community

Databricks143 · ‎12-07-2023

Hi team,.

I am using cluster 9.1 8n databricks not able to generate Excel file in blob and below are conf

Cluster:9.1.8S

park version -3.1.1

Scala version 3.1.1

Library:

Com.crealytics

Spark.excel_2.12

Version-3.1.1_0.18.2

Dependency:

Org.apachr.poi-poi-5.25

Error:Databricks.Errir:No Class DefFoundError

Org/Apache/commons/io/output/unsynchronized ByteArrayOutputstream

Causedby:class not found exception

Org.aoche.commons.io.output.UnsynchronizedByteArrayoutputstrea at .com.crealytics.spark.excel.excefikesaver.save(Excelfilesaver.scala:61)

Request you please check and let me know in this

Kaniz_Fatma · ‎12-08-2023

Hi @Databricks143, It appears that you’re encountering an issue while trying to generate an Excel file in Azure Databricks.

Let’s troubleshoot this step by step:

Library Dependencies:

Ensure that the necessary libraries are correctly installed in your Databricks cluster. Specifically, you mentioned using the com.crealytics:spark-excel_2.12:0.18.2 library. Confirm that it’s available and properly configured.
Additionally, make sure the Apache POI library (org.apache.poi:poi-5.25) is accessible. This library is essential for handling Excel files.

Class Not Found Exception:

The error message indicates a class not found exception related to UnsynchronizedByteArrayOutputStream.
Verify that the required classes are available in your environment. If not, consider adding the necessary dependencies.

Spark Configuration:

To read Excel files using Spark, you need to configure the cluster appropriately.
Ensure that you’ve added the com.crealytics:spark-excel_2.12:0.18.2 package to your cluster. You can do this via Maven or directly in the Databricks UI.
Set the following Spark configuration to use the Excel package:spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account_name>.dfs.core.windows.net", "<account_key>") Replace <account_name> and <account_key> with your Azure storage account details.

Reading Excel Files:

Once the setup is correct, you can read Excel files using Spark.
Here’s an example:
df = sqlContext.read.format("com.crealytics.spark.excel") \ .option("header", "true") \ .load("abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path_to_excel_file>") df.show()
Replace <container_name>, <storage_account_name>, and <path_to_excel_file> with your specific details.

Remember to adjust the paths and credentials according to your environment. If you encounter any further issues, feel free to ask for additional assistance! 😊

Databricks Community

Not able to generate Excel file in blob through databricks

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 10 October - 31 October

Databricks Community Social | 30 September 2024 | 8AM PST

Intelligent Data Engineering: Beyond the AI Hype

GenAI: The Shift to Data Intelligence

Big Book of Data Engineering — 3rd Edition