How to Implement Incremental Loading in Azure Databricks for ETL

chexa_Wee — Thu, 22 May 2025 06:09:29 GMT

Hi everyone,

I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.

Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.

My setup:

Source: Azure SQL Database
Target: Databricks Delta Table
Scheduler: Daily Databricks job
Purpose: Power BI dashboards using processed data

What I'm looking for:

A standard or recommended approach to implement incremental loading in Databricks
Best practices for tracking the last load timestamp (e.g., using a watermark)
Example code or a step-by-step tutorial
Any built-in Databricks utilities or patterns to support this on the Standard Tier

If you've set this up before or know of any good resources, I’d really appreciate your help!

Thanks in advance!

Re: How to Implement Incremental Loading in Azure Databricks for ETL

nikhilj0421 — Fri, 23 May 2025 05:12:37 GMT

Hi @chexa_Wee, you can leverage DLT feature to do so.

Please check: https://docs.databricks.com/aws/en/dlt/transform

https://docs.databricks.com/aws/en/dlt/stateful-processing

Here is the step-by-step tutorial: https://docs.databricks.com/aws/en/dlt/tutorials

topic How to Implement Incremental Loading in Azure Databricks for ETL in Data Engineering

How to Implement Incremental Loading in Azure Databricks for ETL

Re: How to Implement Incremental Loading in Azure Databricks for ETL