How to load xlsx Files to Delta Live Tables (DLT)?

avrm91
Databricks Partner

I want to load a .xlsx file to DLT but struggling as it is not available with Autoloader.


With the Assistant I tried to load the .xlsx first to a data frame and then send it to DLT.

 

 

import dlt
from pyspark.sql import SparkSession

# Load xlsx file into DataFrame
df = spark.read \
    .format("com.crealytics.spark.excel") \
    .option("dataAddress", "${my_etl.sheet_address}") \
    .option("header", "${my_etl.header}") \
    .option("inferSchema", "${my_etl.infer_schema}") \
    .option("timestampFormat", "${my_etl.timestamp_format}") \
    .load("${my_etl.input_path}")

@dlt.table
def conformed():
    return df

 

 

but it is not working.

shan_chandra
Databricks Employee
Databricks Employee

@avrm91  - can try dividing xlsx files into a csv as a preprocessing step and ingest them in to a dataframe using Autoloader. Also, you can use openpyxl to load into a dataframe. refer to this doc for example.

 

avrm91
Databricks Partner

For now, I will use the Azure Data Factory to convert the xlsx to csv and then process it within Databricks. 
Overall, I wonder if there will be a xlsx Autoloader implementation in the future, as the preprocessing will not have the Autoloader features and I will need to take care what was already loaded and what not.
All over I would not say that it is a solution so I will let this ticket open.

shan_chandra
Databricks Employee
Databricks Employee

@avrm91 - There is a feature request in place. DB engg will priorotized. As of now no ETA yet on this. 

avrm91
Databricks Partner

Added a feature request into Azure Community Portal
XLSX - DLT Autoloader · Community (azure.com)