<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to Implement Incremental Loading in Azure Databricks for ETL in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119929#M45995</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.&lt;/P&gt;&lt;P&gt;Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.&lt;/P&gt;&lt;P&gt;My setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Source: Azure SQL Database&lt;/LI&gt;&lt;LI&gt;Target: Databricks Delta Table&lt;/LI&gt;&lt;LI&gt;Scheduler: Daily Databricks job&lt;/LI&gt;&lt;LI&gt;Purpose: Power BI dashboards using processed data&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What I'm looking for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;A standard or recommended approach to implement incremental loading in Databricks&lt;/LI&gt;&lt;LI&gt;Best practices for tracking the last load timestamp (e.g., using a watermark)&lt;/LI&gt;&lt;LI&gt;Example code or a step-by-step tutorial&lt;/LI&gt;&lt;LI&gt;Any built-in Databricks utilities or patterns to support this on the Standard Tier&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you've set this up before or know of any good resources, I’d really appreciate your help!&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Thu, 22 May 2025 06:09:29 GMT</pubDate>
    <dc:creator>chexa_Wee</dc:creator>
    <dc:date>2025-05-22T06:09:29Z</dc:date>
    <item>
      <title>How to Implement Incremental Loading in Azure Databricks for ETL</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119929#M45995</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently working on an ETL process using Azure Databricks (Standard Tier) where I load data from Azure SQL Database into Databricks. I run a notebook daily to extract, transform, and load the data for Power BI reports.&lt;/P&gt;&lt;P&gt;Right now, the notebook loads all data from the beginning every time it runs, which is inefficient and causes unnecessary processing time. I want to switch to incremental loading, so the job only fetches new or changed records since the last successful run.&lt;/P&gt;&lt;P&gt;My setup:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Source: Azure SQL Database&lt;/LI&gt;&lt;LI&gt;Target: Databricks Delta Table&lt;/LI&gt;&lt;LI&gt;Scheduler: Daily Databricks job&lt;/LI&gt;&lt;LI&gt;Purpose: Power BI dashboards using processed data&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;What I'm looking for:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;A standard or recommended approach to implement incremental loading in Databricks&lt;/LI&gt;&lt;LI&gt;Best practices for tracking the last load timestamp (e.g., using a watermark)&lt;/LI&gt;&lt;LI&gt;Example code or a step-by-step tutorial&lt;/LI&gt;&lt;LI&gt;Any built-in Databricks utilities or patterns to support this on the Standard Tier&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you've set this up before or know of any good resources, I’d really appreciate your help!&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 06:09:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/119929#M45995</guid>
      <dc:creator>chexa_Wee</dc:creator>
      <dc:date>2025-05-22T06:09:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to Implement Incremental Loading in Azure Databricks for ETL</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120020#M46027</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155027"&gt;@chexa_Wee&lt;/a&gt;, you can leverage DLT feature to do so.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please check:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/dlt/transform" target="_blank"&gt;https://docs.databricks.com/aws/en/dlt/transform&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/dlt/stateful-processing" target="_blank"&gt;https://docs.databricks.com/aws/en/dlt/stateful-processing&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Here is the s&lt;SPAN&gt;tep-by-step tutorial:&lt;/SPAN&gt;&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/dlt/tutorials" target="_blank"&gt;https://docs.databricks.com/aws/en/dlt/tutorials&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 05:12:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-incremental-loading-in-azure-databricks-for-etl/m-p/120020#M46027</guid>
      <dc:creator>nikhilj0421</dc:creator>
      <dc:date>2025-05-23T05:12:37Z</dc:date>
    </item>
  </channel>
</rss>

