Best Practices for implementing DLT, Autoloader in...

Swathik · ‎11-24-2025

I am in the process of designing a Medallion architecture where the data sources include REST API calls, JSON files, SQL Server, and Azure Event Hubs.

For the Silver and Gold layers, I plan to leverage Delta Live Tables (DLT). However, I am seeking guidance on the most effective approach to implement the Bronze layer, particularly in combination with Autoloader.

Specifically, for JSON file ingestion, I intend to use Autoloader with the trigger(availableNow=True) option. My understanding is that this option is not currently supported within DLT pipelines.

Could you please advise on recommended practices for implementing the Bronze layer to handle both batch and streaming ingestion scenarios in DLT, while ensuring compatibility with Autoloader?

mark_ott · ‎11-26-2025

The optimal approach for implementing the Bronze layer in a Medallion architecture with Delta Live Tables (DLT) involves balancing batch and streaming ingestion patterns, especially when combining DLT and Autoloader. The trigger(availableNow=True) option for Autoloader is currently not supported within DLT pipelines, meaning direct batch-style ingestion using this trigger must instead be orchestrated outside DLT or managed differently.

Bronze Layer Best Practices

For most JSON file ingestion scenarios, use Autoloader with streaming mode in DLT. This provides strong support for both file arrival triggers and continuous streaming while allowing for schema evolution and integration with quality checks.
Store the ingested raw data as a Delta table to maximize compatibility with downstream Silver and Gold transformations.
For mixed batch and streaming requirements, design your Bronze ingestion so that:
- Azure Event Hubs and other continuous sources use Structured Streaming within DLT.
- REST API calls and bulk JSON files can be orchestrated as external batch processes that land files in a storage location watched by Autoloader. Even though availableNow is not available in DLT, you can still process new files incrementally as they arrive or set up a separate process for batch-triggered ingestion.

Handling Batch and Streaming Together

If you require the semantics of batch processing (one-off or scheduled ingestion of a discrete set of files), consider running a separate Spark job (outside of DLT) that uses trigger(availableNow=True) and writes the output to a Bronze Delta table, which you then register as a DLT source table for your pipeline.
For ongoing streaming or micro-batch ingestion, define your Bronze tables using the regular streaming capabilities of DLT connected to Autoloader. This ensures both real-time and near real-time data are processed efficiently and land in the same Bronze context.

Summary Table: Ingestion Options

Source Type	Recommended Ingestion in Bronze Layer	Batch/Streaming	Integration with DLT Autoloader
JSON Files (Batch)	Spark job with Autoloader, trigger(availableNow), write to Delta	Batch	Register output as source table
JSON Files (Streaming)	DLT streaming table using Autoloader	Streaming	Full DLT support, no availableNow
Event Hubs	Structured Streaming in DLT	Streaming	Native DLT support
REST API	Orchestrate API pulls, land files, ingest as above	Batch/Streaming	External orchestration, then DLT
SQL Server	Periodic extract or change data capture, land to files or Delta	Batch/Streaming	External ingest, then DLT

By decoupling batch "catch-up" and ongoing streaming in the Bronze layer, you ensure compatibility, recoverability, and optimal use of DLT's features.

View solution in original post