- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 02:39 AM
Hello, I looked at the documentation but could not find what I wanted. Is there a way to load Excel files using an autoloader and if yes, what options should be given to specify format, sheet name etc ?
Thank you friends !
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 04:12 AM
Hi @erigaud
You can use spark.read.format("com.crealytics.spark.excel") while reading excel files using autoloader and to specify format you need to provide com.crealytics.spark.excel and to specify sheet name you can provide it under options.
Please find the below example code to read load Excel files using an autoloader:
spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \
.option("dataAddress", "'Data - Current'!A1") \
.option("treatEmptyValuesAsNulls", "true") \
.option("inferSchema", "true") \
.load(location)
And you can also use pandas to read excel files:
import pandas as pd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 04:12 AM
Hi @erigaud
You can use spark.read.format("com.crealytics.spark.excel") while reading excel files using autoloader and to specify format you need to provide com.crealytics.spark.excel and to specify sheet name you can provide it under options.
Please find the below example code to read load Excel files using an autoloader:
spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \
.option("dataAddress", "'Data - Current'!A1") \
.option("treatEmptyValuesAsNulls", "true") \
.option("inferSchema", "true") \
.load(location)
And you can also use pandas to read excel files:
import pandas as pd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 08:31 AM
Where autoloader is being used in the above snippet 🤔 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 04:23 AM
Thank you very much, great solution !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 04:43 AM
Unfortunately, Databricks autoloader doesn't support Excel file types to incrementally load new files.
Link:https://docs.databricks.com/ingestion/auto-loader/options.html
If your excel file contains a single sheet then there is a workaround.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2023 02:19 AM
What is the workaround ? Is it to transform the excel to a csv file ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2023 08:42 PM
But using autoloader for the conversion into csv.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2023 11:24 PM
How would that work exactly ? I specify to the autoloader that the format is csv and it will be able to pick up the excel files, and load the sheet normally, even if the format is .xlsx ?

