<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Delta Live Table (Streaming Tables) for excel (.xlsx, .xls) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-table-streaming-tables-for-excel-xlsx-xls/m-p/120101#M46063</link>
    <description>&lt;P&gt;What's the native way to ingest excel files using a streaming table? I wish that when the excel files land in unity catalog, it can pick up those and load it in to the Streaming Table.&amp;nbsp;&lt;BR /&gt;Data is Small, so we can afford some kind of UDF, but we really need to auto discover new files and ensure exactly once.&lt;BR /&gt;Thanks!&lt;/P&gt;&lt;P&gt;#Delta Live Tables&lt;/P&gt;</description>
    <pubDate>Fri, 23 May 2025 16:57:21 GMT</pubDate>
    <dc:creator>NathanC0926</dc:creator>
    <dc:date>2025-05-23T16:57:21Z</dc:date>
    <item>
      <title>Delta Live Table (Streaming Tables) for excel (.xlsx, .xls)</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-streaming-tables-for-excel-xlsx-xls/m-p/120101#M46063</link>
      <description>&lt;P&gt;What's the native way to ingest excel files using a streaming table? I wish that when the excel files land in unity catalog, it can pick up those and load it in to the Streaming Table.&amp;nbsp;&lt;BR /&gt;Data is Small, so we can afford some kind of UDF, but we really need to auto discover new files and ensure exactly once.&lt;BR /&gt;Thanks!&lt;/P&gt;&lt;P&gt;#Delta Live Tables&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 16:57:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-streaming-tables-for-excel-xlsx-xls/m-p/120101#M46063</guid>
      <dc:creator>NathanC0926</dc:creator>
      <dc:date>2025-05-23T16:57:21Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Table (Streaming Tables) for excel (.xlsx, .xls)</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-table-streaming-tables-for-excel-xlsx-xls/m-p/120119#M46070</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/165714"&gt;@NathanC0926&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ingesting Excel files with streaming tables requires a combination of Databricks Autoloader&lt;BR /&gt;(for file discovery and exactly-once processing) and a custom UDF for Excel parsing.&lt;BR /&gt;Here's the native approach&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Key Features of This Solution&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;1. Exactly-Once Processing&lt;/STRONG&gt;&lt;BR /&gt;-- Autoloader automatically handles deduplication&lt;BR /&gt;-- Uses checkpointing to ensure files are processed exactly once&lt;BR /&gt;-- Tracks processed files in the schema location&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. Auto-Discovery&lt;/STRONG&gt;&lt;BR /&gt;-- Autoloader continuously monitors the specified path&lt;BR /&gt;-- Automatically picks up new Excel files as they arrive&lt;BR /&gt;-- Supports glob patterns for file filtering&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3. Native Integration&lt;/STRONG&gt;&lt;BR /&gt;-- Uses Databricks' native Autoloader functionality&lt;BR /&gt;-- Integrates seamlessly with Unity Catalog&lt;BR /&gt;-- Supports Delta Live Tables (DLT) pattern&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Alternative: Using Delta Live Tables&lt;/STRONG&gt;&lt;BR /&gt;For a more declarative approach, use the DLT version provided in the code. It offers:&lt;BR /&gt;-- Built-in data quality monitoring&lt;BR /&gt;-- Automatic pipeline orchestration&lt;BR /&gt;-- Better integration with UC governance features&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Performance Considerations&lt;/STRONG&gt;&lt;BR /&gt;-- For Small Files: The UDF approach works well&lt;BR /&gt;-- For Large Files: Consider pre-processing Excel files to Parquet&lt;BR /&gt;-- Memory Management: Use read_only=True in openpyxl for large files&lt;BR /&gt;-- Concurrency: Autoloader handles parallelization automatically&lt;/P&gt;&lt;P&gt;This solution provides the native way to handle Excel files in streaming fashion while ensuring exactly-once processing&lt;BR /&gt;and auto-discovery of new files in Unity Catalog.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 23:12:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-table-streaming-tables-for-excel-xlsx-xls/m-p/120119#M46070</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-05-23T23:12:31Z</dc:date>
    </item>
  </channel>
</rss>

