<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to Implement Incremental Loading in Azure Databricks for ETL in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119688#M45944</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.&lt;/P&gt;&lt;P&gt;Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.&lt;/P&gt;&lt;P&gt;My setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Source: Azure SQL Database&lt;/LI&gt;&lt;LI&gt;Target: Databricks Delta Table&lt;/LI&gt;&lt;LI&gt;Scheduler: Daily Databricks job&lt;/LI&gt;&lt;LI&gt;Purpose: Power BI dashboards using processed data&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What I'm looking for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;A standard or recommended approach to implement incremental loading in Databricks&lt;/LI&gt;&lt;LI&gt;Best practices for tracking the last load timestamp (e.g., using a watermark)&lt;/LI&gt;&lt;LI&gt;Example code or a step-by-step tutorial&lt;/LI&gt;&lt;LI&gt;Any built-in Databricks utilities or patterns to support this on the Standard Tier&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you've set this up before or know of any good resources, I’d really appreciate your help!&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Tue, 20 May 2025 05:15:43 GMT</pubDate>
    <dc:creator>chexa_Wee</dc:creator>
    <dc:date>2025-05-20T05:15:43Z</dc:date>
    <item>
      <title>How to Implement Incremental Loading in Azure Databricks for ETL</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119688#M45944</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.&lt;/P&gt;&lt;P&gt;Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.&lt;/P&gt;&lt;P&gt;My setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Source: Azure SQL Database&lt;/LI&gt;&lt;LI&gt;Target: Databricks Delta Table&lt;/LI&gt;&lt;LI&gt;Scheduler: Daily Databricks job&lt;/LI&gt;&lt;LI&gt;Purpose: Power BI dashboards using processed data&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What I'm looking for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;A standard or recommended approach to implement incremental loading in Databricks&lt;/LI&gt;&lt;LI&gt;Best practices for tracking the last load timestamp (e.g., using a watermark)&lt;/LI&gt;&lt;LI&gt;Example code or a step-by-step tutorial&lt;/LI&gt;&lt;LI&gt;Any built-in Databricks utilities or patterns to support this on the Standard Tier&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you've set this up before or know of any good resources, I’d really appreciate your help!&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 05:15:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119688#M45944</guid>
      <dc:creator>chexa_Wee</dc:creator>
      <dc:date>2025-05-20T05:15:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to Implement Incremental Loading in Azure Databricks for ETL</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120024#M46030</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155027"&gt;@chexa_Wee&lt;/a&gt;&amp;nbsp;answered in the recent post:&amp;nbsp;&lt;A href="https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120020#M46027" target="_blank"&gt;https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120020#M46027&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 05:24:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120024#M46030</guid>
      <dc:creator>nikhilj0421</dc:creator>
      <dc:date>2025-05-23T05:24:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to Implement Incremental Loading in Azure Databricks for ETL</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120036#M46037</link>
      <description>&lt;P&gt;In case you do not want to use dlt (and there are reasons not to), you can also check the docs for &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/" target="_self"&gt;autoloader&lt;/A&gt; and &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/merge" target="_self"&gt;merge notebooks&lt;/A&gt;&lt;/P&gt;&lt;P&gt;These 2 do basically the same as dlt but without the extra cost and more control.&amp;nbsp; You have to write more code though.&lt;BR /&gt;For ingesting the SQL server data I would use Data Factory, which lands the data onto your bronze layer (adls gen2).&lt;BR /&gt;Or use the Azure SQL connector of Databricks, but that will use DLT and is more expensive than ADF but has the ease of use (but less control/visibility).&lt;BR /&gt;So you see, many choices.&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 08:02:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120036#M46037</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-05-23T08:02:46Z</dc:date>
    </item>
  </channel>
</rss>

