Databricks Community

NathanC0926 · ‎05-23-2025

What's the native way to ingest excel files using a streaming table? I wish that when the excel files land in unity catalog, it can pick up those and load it in to the Streaming Table.
Data is Small, so we can afford some kind of UDF, but we really need to auto discover new files and ensure exactly once.
Thanks!

#Delta Live Tables

lingareddy_Alva · ‎05-23-2025

Hi @NathanC0926

Ingesting Excel files with streaming tables requires a combination of Databricks Autoloader
(for file discovery and exactly-once processing) and a custom UDF for Excel parsing.
Here's the native approach

Key Features of This Solution
1. Exactly-Once Processing
-- Autoloader automatically handles deduplication
-- Uses checkpointing to ensure files are processed exactly once
-- Tracks processed files in the schema location

2. Auto-Discovery
-- Autoloader continuously monitors the specified path
-- Automatically picks up new Excel files as they arrive
-- Supports glob patterns for file filtering

3. Native Integration
-- Uses Databricks' native Autoloader functionality
-- Integrates seamlessly with Unity Catalog
-- Supports Delta Live Tables (DLT) pattern

Alternative: Using Delta Live Tables
For a more declarative approach, use the DLT version provided in the code. It offers:
-- Built-in data quality monitoring
-- Automatic pipeline orchestration
-- Better integration with UC governance features

Performance Considerations
-- For Small Files: The UDF approach works well
-- For Large Files: Consider pre-processing Excel files to Parquet
-- Memory Management: Use read_only=True in openpyxl for large files
-- Concurrency: Autoloader handles parallelization automatically

This solution provides the native way to handle Excel files in streaming fashion while ensuring exactly-once processing
and auto-discovery of new files in Unity Catalog.

LR

Databricks Community

Delta Live Table (Streaming Tables) for excel (.xlsx, .xls)

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog