cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Table (Streaming Tables) for excel (.xlsx, .xls)

NathanC0926
New Contributor

What's the native way to ingest excel files using a streaming table? I wish that when the excel files land in unity catalog, it can pick up those and load it in to the Streaming Table. 
Data is Small, so we can afford some kind of UDF, but we really need to auto discover new files and ensure exactly once.
Thanks!

#Delta Live Tables

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @NathanC0926 

Ingesting Excel files with streaming tables requires a combination of Databricks Autoloader
(for file discovery and exactly-once processing) and a custom UDF for Excel parsing.
Here's the native approach

Key Features of This Solution
1. Exactly-Once Processing
-- Autoloader automatically handles deduplication
-- Uses checkpointing to ensure files are processed exactly once
-- Tracks processed files in the schema location

2. Auto-Discovery
-- Autoloader continuously monitors the specified path
-- Automatically picks up new Excel files as they arrive
-- Supports glob patterns for file filtering

3. Native Integration
-- Uses Databricks' native Autoloader functionality
-- Integrates seamlessly with Unity Catalog
-- Supports Delta Live Tables (DLT) pattern

Alternative: Using Delta Live Tables
For a more declarative approach, use the DLT version provided in the code. It offers:
-- Built-in data quality monitoring
-- Automatic pipeline orchestration
-- Better integration with UC governance features

Performance Considerations
-- For Small Files: The UDF approach works well
-- For Large Files: Consider pre-processing Excel files to Parquet
-- Memory Management: Use read_only=True in openpyxl for large files
-- Concurrency: Autoloader handles parallelization automatically

This solution provides the native way to handle Excel files in streaming fashion while ensuring exactly-once processing
and auto-discovery of new files in Unity Catalog.

 

LR