cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to read excel file using databricks

PraveenSaini
New Contributor

0

I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("/FileStore/tables/Airline.xlsx")

34 REPLIES 34

Roshanshekh
New Contributor II

I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("/FileStore/tables/Airline.xlsx")

Sanket9669
New Contributor II

Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory

 

Verify that the file path you provided ("/FileStore/tables/Airline.xlsx") is correct. Make sure the file exists at that location. You can use Databricks' file system utilities or commands to list the contents of the directory to confirm the file's presence.

 

Verify that the file you are trying to read is indeed in Excel format (e.g., ".xlsx"). If it's a different format, you may need to use a different method or library to read it.

Mahanatadattu
New Contributor II

@PraveenSaini wrote:

 

0

I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "dbfs:/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("dbfs:/FileStore/tables/Airline.xlsx")

 


val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("/FileStore/tables/Airline.xlsx")

Mahanatadattu
New Contributor II

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "dbfs:/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("dbfs:/FileStore/tables/Airline.xlsx")

Jenish_lodha
New Contributor II

To read an Excel file using Databricks, you can use the Databricks runtime, which supports multiple programming languages such as Python, Scala, and R. Here are the general steps to read an Excel file in Databricks using Python:

1. **Upload the Excel File**:
- First, upload your Excel file to a location that is accessible from your Databricks workspace. You can use Databricks DBFS (Databricks File System), AWS S3, Azure Blob Storage, or any other supported storage.

2. **Create a Databricks Cluster**:
- If you haven't already, create a Databricks cluster to run your code.

3. **Install Required Libraries**:
- You'll need to install the necessary libraries to work with Excel files. In Python, you can use the `pandas` library, which is commonly used for data manipulation.

```python
# Install the pandas library
%pip install pandas
```

4. **Read the Excel File**:
- You can read the Excel file into a Pandas DataFrame using the `pd.read_excel()` function. Provide the path to your Excel file as the argument.

```python
import pandas as pd

# Replace 'dbfs:/path_to_your_excel_file.xlsx' with the actual path to your Excel file
excel_file_path = 'dbfs:/path_to_your_excel_file.xlsx'

# Read the Excel file into a Pandas DataFrame
df = pd.read_excel(excel_file_path)
```

If you're using Scala or R, you can use the respective libraries (e.g., Apache POI for Scala).

5. **Analyze or Process Data**:
- Once you have the data in a DataFrame, you can analyze, process, or visualize it as needed within your Databricks notebook.

6. **Save or Export Results**:
- If you want to save your results or export data back to a storage location, you can use the appropriate Databricks file APIs or libraries.

Remember to adjust the code and file paths according to your specific Databricks setup and file location. Additionally, ensure that you have the necessary permissions to access the Excel file from your Databricks cluster.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group