cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to import excel on databricks

OnerFusion-AI
New Contributor

To import an Excel file into Databricks, you can follow these general steps:

1. **Upload the Excel File**:
- Go to the Databricks workspace or cluster where you want to work.
- Navigate to the location where you want to upload the Excel file.
- Click on the "Data" tab in the Databricks workspace and select the folder where you want to upload the file.
- Click the "Upload" button and select your Excel file from your local machine.

2. **Create a DataFrame**:
- Once your Excel file is uploaded, you need to create a DataFrame from it. In Databricks, you typically use Apache Spark for data manipulation. You can use the `spark.read` method to read the Excel file into a DataFrame. Here's an example using Python:

```python
from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("ExcelImport").getOrCreate()

# Read the Excel file into a DataFrame
excel_df = spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \ # If your Excel file has headers
.load("/FileStore/your_excel_file.xlsx") # Update with your file path
```

Make sure to replace `"/FileStore/your_excel_file.xlsx"` with the correct path to your uploaded Excel file.

3. **Use the DataFrame**:
- Once you have the DataFrame, you can perform various operations on it, such as filtering, aggregating, or transforming the data.

4. **Write Data Back**:
- If you need to save the processed data back to Databricks or export it to another format, you can use the `DataFrame.write` method.

Remember to adjust the code according to your specific use case and data. Databricks provides different ways to read Excel files, and you may need to install the necessary libraries or packages depending on your Databricks environment and Spark version. The example above assumes you have the "com.crealytics.spark.excel" library available for reading Excel files.

1 ACCEPTED SOLUTION

Accepted Solutions

AhmedAlnaqa
Contributor

thanks for this good steps but you miss one step:
- In the cluster page we have to add the library called "com.crealytics:spark-excel_2.12:0.13.5" at least, to be attached in any notebooks attached to the cluster.

View solution in original post

3 REPLIES 3

AhmedAlnaqa
Contributor

thanks for this good steps but you miss one step:
- In the cluster page we have to add the library called "com.crealytics:spark-excel_2.12:0.13.5" at least, to be attached in any notebooks attached to the cluster.

AhmedAlnaqa
Contributor

The question here is how to read the multi-excel files based on path.
The mentioned solution interacts with one file only, do we have the ability to read all the Excel files in the folder?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group