cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Implement Incremental Loading in Azure Databricks for ETL

chexa_Wee
New Contributor III

Hi everyone,

I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.

Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.

My setup:

  • Source: Azure SQL Database
  • Target: Databricks Delta Table
  • Scheduler: Daily Databricks job
  • Purpose: Power BI dashboards using processed data

What I'm looking for:

  • A standard or recommended approach to implement incremental loading in Databricks
  • Best practices for tracking the last load timestamp (e.g., using a watermark)
  • Example code or a step-by-step tutorial
  • Any built-in Databricks utilities or patterns to support this on the Standard Tier

If you've set this up before or know of any good resources, I’d really appreciate your help!

Thanks in advance!

1 REPLY 1

nikhilj0421
Databricks Employee
Databricks Employee

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now