cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Autoloader Excel Files

erigaud
Honored Contributor

Hello, I looked at the documentation but could not find what I wanted. Is there a way to load Excel files using an autoloader and if yes, what options should be given to specify format, sheet name etc ? 

Thank you friends !

1 ACCEPTED SOLUTION

Accepted Solutions

Vinay_M_R
Valued Contributor II
Valued Contributor II

Hi @erigaud 

You can use spark.read.format("com.crealytics.spark.excel") while reading excel files using autoloader and to specify format you need to provide com.crealytics.spark.excel and to specify sheet name you can provide it under options. 

Please find the below example code to read load Excel files using an autoloader:

spark.read.format("com.crealytics.spark.excel") \

.option("header", "true") \

.option("dataAddress", "'Data - Current'!A1") \

.option("treatEmptyValuesAsNulls", "true") \

.option("inferSchema", "true") \

.load(location)

And you can also use pandas to read excel files:

import pandas as pd

pd.read_excel(file ,sheet_name = 0, index_col = 0)

View solution in original post

7 REPLIES 7

Vinay_M_R
Valued Contributor II
Valued Contributor II

Hi @erigaud 

You can use spark.read.format("com.crealytics.spark.excel") while reading excel files using autoloader and to specify format you need to provide com.crealytics.spark.excel and to specify sheet name you can provide it under options. 

Please find the below example code to read load Excel files using an autoloader:

spark.read.format("com.crealytics.spark.excel") \

.option("header", "true") \

.option("dataAddress", "'Data - Current'!A1") \

.option("treatEmptyValuesAsNulls", "true") \

.option("inferSchema", "true") \

.load(location)

And you can also use pandas to read excel files:

import pandas as pd

pd.read_excel(file ,sheet_name = 0, index_col = 0)

Hemant
Valued Contributor II

Where autoloader is being used in the above snippet ๐Ÿค” ?

Hemant Soni

erigaud
Honored Contributor

Thank you very much, great solution !

Hemant
Valued Contributor II

Unfortunately, Databricks autoloader doesn't support Excel file types to incrementally load new files.

Link:https://docs.databricks.com/ingestion/auto-loader/options.html 

If your excel file contains a single sheet then there is a workaround.

Hemant Soni

erigaud
Honored Contributor

What is the workaround ? Is it to transform the excel to a csv file ? 

Hemant
Valued Contributor II

But using autoloader for the conversion into csv.

Hemant Soni

erigaud
Honored Contributor

How would that work exactly ? I specify to the autoloader that the format is csv and it will be able to pick up the excel files, and load the sheet normally, even if the format is .xlsx ?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.