Databricks Community

chexa_Wee · ‎05-21-2025

Hi everyone,

I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.

Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.

My setup:

Source: Azure SQL Database
Target: Databricks Delta Table
Scheduler: Daily Databricks job
Purpose: Power BI dashboards using processed data

What I'm looking for:

A standard or recommended approach to implement incremental loading in Databricks
Best practices for tracking the last load timestamp (e.g., using a watermark)
Example code or a step-by-step tutorial
Any built-in Databricks utilities or patterns to support this on the Standard Tier

If you've set this up before or know of any good resources, I’d really appreciate your help!

Thanks in advance!