- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-24-2025 07:58 AM
I am in the process of designing a Medallion architecture where the data sources include REST API calls, JSON files, SQL Server, and Azure Event Hubs.
For the Silver and Gold layers, I plan to leverage Delta Live Tables (DLT). However, I am seeking guidance on the most effective approach to implement the Bronze layer, particularly in combination with Autoloader.
Specifically, for JSON file ingestion, I intend to use Autoloader with the trigger(availableNow=True) option. My understanding is that this option is not currently supported within DLT pipelines.
Could you please advise on recommended practices for implementing the Bronze layer to handle both batch and streaming ingestion scenarios in DLT, while ensuring compatibility with Autoloader?
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2025 04:40 AM
The optimal approach for implementing the Bronze layer in a Medallion architecture with Delta Live Tables (DLT) involves balancing batch and streaming ingestion patterns, especially when combining DLT and Autoloader. The trigger(availableNow=True) option for Autoloader is currently not supported within DLT pipelines, meaning direct batch-style ingestion using this trigger must instead be orchestrated outside DLT or managed differently.
Bronze Layer Best Practices
-
For most JSON file ingestion scenarios, use Autoloader with streaming mode in DLT. This provides strong support for both file arrival triggers and continuous streaming while allowing for schema evolution and integration with quality checks.
-
Store the ingested raw data as a Delta table to maximize compatibility with downstream Silver and Gold transformations.
-
For mixed batch and streaming requirements, design your Bronze ingestion so that:
-
Azure Event Hubs and other continuous sources use Structured Streaming within DLT.
-
REST API calls and bulk JSON files can be orchestrated as external batch processes that land files in a storage location watched by Autoloader. Even though availableNow is not available in DLT, you can still process new files incrementally as they arrive or set up a separate process for batch-triggered ingestion.
-
Handling Batch and Streaming Together
-
If you require the semantics of batch processing (one-off or scheduled ingestion of a discrete set of files), consider running a separate Spark job (outside of DLT) that uses trigger(availableNow=True) and writes the output to a Bronze Delta table, which you then register as a DLT source table for your pipeline.
-
For ongoing streaming or micro-batch ingestion, define your Bronze tables using the regular streaming capabilities of DLT connected to Autoloader. This ensures both real-time and near real-time data are processed efficiently and land in the same Bronze context.
Summary Table: Ingestion Options
| Source Type | Recommended Ingestion in Bronze Layer | Batch/Streaming | Integration with DLT Autoloader |
|---|---|---|---|
| JSON Files (Batch) | Spark job with Autoloader, trigger(availableNow), write to Delta | Batch | Register output as source table |
| JSON Files (Streaming) | DLT streaming table using Autoloader | Streaming | Full DLT support, no availableNow |
| Event Hubs | Structured Streaming in DLT | Streaming | Native DLT support |
| REST API | Orchestrate API pulls, land files, ingest as above | Batch/Streaming | External orchestration, then DLT |
| SQL Server | Periodic extract or change data capture, land to files or Delta | Batch/Streaming | External ingest, then DLT |
By decoupling batch "catch-up" and ongoing streaming in the Bronze layer, you ensure compatibility, recoverability, and optimal use of DLT's features.