<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks Serverless Pipelines - Incremental Refresh Doubts in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144008#M52247</link>
    <description>&lt;DIV class="p-rich_text_section"&gt;Hi Alf0 and welcome to the Databricks Community!&lt;BR /&gt;&lt;BR /&gt;The Lakeflow Spark Declarative Pipelines (SDP) cost model considers multiple factors when deciding whether to perform an incremental refresh or a full recompute. It makes a best-effort attempt to incrementally refresh results for all supported operations.To address your specific observations:&lt;/DIV&gt;
&lt;UL class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;The "Cost" Field:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;The cost field you see in the event log is a legacy attribute and is not actually utilized by the current cost model. It is slated for removal soon, so you can safely ignore it for now.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Manual Control:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;While the model is generally accurate, we recognize the need for more control. We are working on a feature that will allow users to explicitly define the refresh strategy (incremental vs. full recompute) for Materialized Views (MVs). Stay tuned for updates!&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="p-rich_text_section"&gt;&lt;STRONG data-stringify-type="bold"&gt;Best Practices for Ensuring Incremental Refresh:&lt;/STRONG&gt;If you are seeing unexpected full recomputes, consider these optimizations:&lt;/DIV&gt;
&lt;UL class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Decompose Complex MVs:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Split large, complex MVs into multiple smaller ones. Excessive joins or deeply nested operators can sometimes exceed the complexity threshold for incrementalization.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Increase Update Frequency:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If source tables change significantly between runs, the model may determine a full recompute is cheaper. If you see&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;CHANGESET_SIZE_THRESHOLD_EXCEEDED&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in your logs, try running updates more frequently to reduce the volume of changes per update.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Ensure Deletion Vectors and Row-Level Tracking are enabled on your source tables:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Deletion vectors minimize the changeset size, and row-level tracking is a prerequisite for incrementalizing certain operators.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Non-Deterministic Functions:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;These are generally supported in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;WHERE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;clauses. Support for operators is constantly updated in the doc:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://learn.microsoft.com/en-gb/azure/databricks/optimizations/incremental-refresh#enzyme-support" target="_blank" rel="noopener noreferrer" data-stringify-link="https://learn.microsoft.com/en-gb/azure/databricks/optimizations/incremental-refresh#enzyme-support" data-sk="tooltip_parent"&gt;Azure&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;l&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" target="_blank" rel="noopener noreferrer" data-stringify-link="https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" data-sk="tooltip_parent"&gt;AWS&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;l&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" target="_blank" rel="noopener noreferrer" data-stringify-link="https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" data-sk="tooltip_parent"&gt;GCP&lt;/A&gt;.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="p-rich_text_section"&gt;For a deeper dive into how these recomputes are calculated, I recommend this technical blog:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://www.databricks.com/blog/optimizing-materialized-views-recomputes" target="_blank" rel="noopener noreferrer" data-stringify-link="https://www.databricks.com/blog/optimizing-materialized-views-recomputes" data-sk="tooltip_parent"&gt;Optimizing Materialized Views Recomputes&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;Hope this helps!&lt;SPAN class="c-message__edited_label" data-sk="tooltip_parent" aria-describedby="sk-tooltip-746"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 14 Jan 2026 08:27:37 GMT</pubDate>
    <dc:creator>aleksandra_ch</dc:creator>
    <dc:date>2026-01-14T08:27:37Z</dc:date>
    <item>
      <title>Databricks Serverless Pipelines - Incremental Refresh Doubts</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/143459#M52180</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I would like to clarify some doubts regarding how Databricks Pipelines (DLT) behave when using serverless pipelines with incremental updates.&lt;BR /&gt;&lt;BR /&gt;In general, incremental processing is enabled and works as expected. However, I have observed some behaviors that I do not fully understand:&lt;/P&gt;&lt;P&gt;In several pipelines, the system selects the incremental execution plan even when the estimated cost of the incremental run appears to be higher than a complete recompute. In those cases, the incremental run indeed ends up taking longer and costing more, which we have been able to verify after execution.&lt;BR /&gt;&lt;BR /&gt;In other pipelines, I have noticed that there seems to be a cost limit or threshold (which I assume is related to the estimated cost of a complete recompute). When the incremental plan exceeds that limit, Databricks chooses to prioritize a complete recompute instead.&lt;/P&gt;&lt;P&gt;This suggests that such a mechanism exists, but it is not consistently triggered across pipelines.&lt;BR /&gt;&lt;BR /&gt;I would like to know more about:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How Databricks actually selects the execution plan (incremental vs complete recompute)&lt;/LI&gt;&lt;LI&gt;How the cost estimation for each plan is calculated&lt;/LI&gt;&lt;LI&gt;Whether there are configurable parameters to influence or tune this decision&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I have tried to find official documentation explaining these internals, but I have found very limited details. Any pointers to relevant documentation, blog posts, or technical explanations would be greatly appreciated.&lt;BR /&gt;&lt;BR /&gt;Finally, a somewhat related question:&lt;BR /&gt;In the documentation I did find, it is mentioned that non-deterministic operations (e.g. those based on the current date/time) should invalidate incremental updates. However, in recent tests I have performed filters based on the current date (e.g. distance to today) and incremental updates were still applied (although some execution plans were discarded).&lt;BR /&gt;Under what conditions such operations still allow incremental execution?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="css"&gt;{
			"comment": "This execution chose complete_recompute using a threshold"
            "technique_information": [
                {
                    "maintenance_type": "MAINTENANCE_TYPE_NO_OP",
                    "incrementalization_issues": [
                        {
                            "issue_type": "DATA_HAS_CHANGED",
                            "prevent_incrementalization": true
                        }
                    ]
                },
                {
                    "incrementalization_issues": [
                        {
                            "issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL",
                            "prevent_incrementalization": true,
                            "cost_model_rejection_subtype": "CHANGESET_SIZE_THRESHOLD_EXCEEDED"
                        }
                    ]
                },
                {
                    "maintenance_type": "MAINTENANCE_TYPE_COMPLETE_RECOMPUTE",
                    "is_chosen": true,
                    "is_applicable": true,
                    "cost": 24966691933335
                },
                {
                    "maintenance_type": "MAINTENANCE_TYPE_ROW_BASED",
                    "is_chosen": false,
                    "is_applicable": true,
                    "cost": 51838151965590
                }
            ],
            "source_table_information": [
                {
                    "table_name": "`inc_customer_event_with_customer_info`",
                    "catalog_table_type": "MATERIALIZED_VIEW",
                    "full_size": 77759940,
                    "num_rows": 2057197,
                    "num_files": 3,
                    "change_size": 77759940,
                    "num_changed_rows": 1166,
                    "num_rows_in_changed_files": 1166,
                    "num_changed_files": 3,
                    "change_file_read_size": 77759940,
                    "is_size_after_pruning": true,
                    "is_row_id_enabled": true,
                    "is_cdf_enabled": true,
                    "is_deletion_vector_enabled": true,
                    "is_change_from_legacy_cdf": true
                },
                {
                    "table_name": "`platform_event`",
                    "catalog_table_type": "STREAMING_TABLE",
                    "partition_columns": [
                        "TenantId"
                    ],
                    "full_size": 108057,
                    "num_rows": 1219,
                    "num_files": 14,
                    "change_size": 0,
                    "num_changed_rows": 0,
                    "num_rows_in_changed_files": 0,
                    "num_changed_files": 0,
                    "change_file_read_size": 0,
                    "is_size_after_pruning": true,
                    "is_row_id_enabled": true,
                    "is_cdf_enabled": true,
                    "is_deletion_vector_enabled": true,
                    "is_change_from_legacy_cdf": false
                },
                {
                    "table_name": "`platform_event_group`",
                    "catalog_table_type": "STREAMING_TABLE",
                    "partition_columns": [
                        "TenantId"
                    ],
                    "full_size": 84011,
                    "num_rows": 250,
                    "num_files": 13,
                    "change_size": 0,
                    "num_changed_rows": 0,
                    "num_rows_in_changed_files": 0,
                    "num_changed_files": 0,
                    "change_file_read_size": 0,
                    "is_size_after_pruning": true,
                    "is_row_id_enabled": true,
                    "is_cdf_enabled": true,
                    "is_deletion_vector_enabled": true,
                    "is_change_from_legacy_cdf": false
                }
            ],
            "target_table_information": {
                "table_name": "`inc_event_dashboard_model`",
                "full_size": 4860340,
                "is_row_id_enabled": true,
                "is_cdf_enabled": true,
                "is_deletion_vector_enabled": true
            },
            "planning_wall_time_ms": 4864,
            "fingerprint_info": {
                "primary_fingerprint_version": 1
            }
        }&lt;/LI-CODE&gt;&lt;LI-CODE lang="css"&gt;{
            "comment": "This execution chose incremental_refresh even with more cost than complete_recompute",
            "technique_information": [
                {
                    "maintenance_type": "MAINTENANCE_TYPE_NO_OP",
                    "incrementalization_issues": [
                        {
                            "issue_type": "DATA_HAS_CHANGED",
                            "prevent_incrementalization": true
                        }
                    ]
                },
                {
                    "maintenance_type": "MAINTENANCE_TYPE_COMPLETE_RECOMPUTE",
                    "is_chosen": false,
                    "is_applicable": true,
                    "cost": 925057881923
                },
                {
                    "maintenance_type": "MAINTENANCE_TYPE_ROW_BASED",
                    "is_chosen": true,
                    "is_applicable": true,
                    "cost": 11994639573063
                }
            ],
            "source_table_information": [
                {
                    "table_name": "`order`",
                    "catalog_table_type": "STREAMING_TABLE",
                    "partition_columns": [
                        "TenantId"
                    ],
                    "full_size": 14840711065,
                    "num_rows": 42379500,
                    "num_files": 228,
                    "change_size": 0,
                    "num_changed_rows": 0,
                    "num_rows_in_changed_files": 0,
                    "num_changed_files": 0,
                    "change_file_read_size": 0,
                    "is_size_after_pruning": true,
                    "is_row_id_enabled": true,
                    "is_cdf_enabled": true,
                    "is_deletion_vector_enabled": true,
                    "is_change_from_legacy_cdf": false
                },
                {
                    "table_name": "`dim_unaccounted_products`",
                    "catalog_table_type": "MANAGED",
                    "full_size": 3681,
                    "num_rows": 19,
                    "num_files": 2,
                    "change_size": 4379,
                    "num_changed_rows": 2,
                    "num_rows_in_changed_files": 2,
                    "num_changed_files": 2,
                    "change_file_read_size": 4379,
                    "is_size_after_pruning": true,
                    "is_row_id_enabled": true,
                    "is_cdf_enabled": true,
                    "is_deletion_vector_enabled": true,
                    "is_change_from_legacy_cdf": false
                }
            ],
            "target_table_information": {
                "table_name": "`inc_operation_analytics`",
                "full_size": 14554925095,
                "is_row_id_enabled": true,
                "is_cdf_enabled": true,
                "is_deletion_vector_enabled": true
            },
            "planning_wall_time_ms": 12122,
            "fingerprint_info": {
                "primary_fingerprint_version": 1
            }
        }&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jan 2026 10:18:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/143459#M52180</guid>
      <dc:creator>Alf01</dc:creator>
      <dc:date>2026-01-09T10:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Serverless Pipelines - Incremental Refresh Doubts</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144008#M52247</link>
      <description>&lt;DIV class="p-rich_text_section"&gt;Hi Alf0 and welcome to the Databricks Community!&lt;BR /&gt;&lt;BR /&gt;The Lakeflow Spark Declarative Pipelines (SDP) cost model considers multiple factors when deciding whether to perform an incremental refresh or a full recompute. It makes a best-effort attempt to incrementally refresh results for all supported operations.To address your specific observations:&lt;/DIV&gt;
&lt;UL class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;The "Cost" Field:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;The cost field you see in the event log is a legacy attribute and is not actually utilized by the current cost model. It is slated for removal soon, so you can safely ignore it for now.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Manual Control:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;While the model is generally accurate, we recognize the need for more control. We are working on a feature that will allow users to explicitly define the refresh strategy (incremental vs. full recompute) for Materialized Views (MVs). Stay tuned for updates!&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="p-rich_text_section"&gt;&lt;STRONG data-stringify-type="bold"&gt;Best Practices for Ensuring Incremental Refresh:&lt;/STRONG&gt;If you are seeing unexpected full recomputes, consider these optimizations:&lt;/DIV&gt;
&lt;UL class="p-rich_text_list p-rich_text_list__bullet p-rich_text_list--nested" data-stringify-type="unordered-list" data-list-tree="true" data-indent="0" data-border="0"&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Decompose Complex MVs:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Split large, complex MVs into multiple smaller ones. Excessive joins or deeply nested operators can sometimes exceed the complexity threshold for incrementalization.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Increase Update Frequency:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;If source tables change significantly between runs, the model may determine a full recompute is cheaper. If you see&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;CHANGESET_SIZE_THRESHOLD_EXCEEDED&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in your logs, try running updates more frequently to reduce the volume of changes per update.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Ensure Deletion Vectors and Row-Level Tracking are enabled on your source tables:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Deletion vectors minimize the changeset size, and row-level tracking is a prerequisite for incrementalizing certain operators.&lt;/LI&gt;
&lt;LI data-stringify-indent="0" data-stringify-border="0"&gt;&lt;STRONG data-stringify-type="bold"&gt;Non-Deterministic Functions:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;These are generally supported in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="c-mrkdwn__code" data-stringify-type="code"&gt;WHERE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;clauses. Support for operators is constantly updated in the doc:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://learn.microsoft.com/en-gb/azure/databricks/optimizations/incremental-refresh#enzyme-support" target="_blank" rel="noopener noreferrer" data-stringify-link="https://learn.microsoft.com/en-gb/azure/databricks/optimizations/incremental-refresh#enzyme-support" data-sk="tooltip_parent"&gt;Azure&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;l&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" target="_blank" rel="noopener noreferrer" data-stringify-link="https://docs.databricks.com/aws/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" data-sk="tooltip_parent"&gt;AWS&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;l&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" target="_blank" rel="noopener noreferrer" data-stringify-link="https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#support-for-materialized-view-incremental-refresh" data-sk="tooltip_parent"&gt;GCP&lt;/A&gt;.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="p-rich_text_section"&gt;For a deeper dive into how these recomputes are calculated, I recommend this technical blog:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="c-link" href="https://www.databricks.com/blog/optimizing-materialized-views-recomputes" target="_blank" rel="noopener noreferrer" data-stringify-link="https://www.databricks.com/blog/optimizing-materialized-views-recomputes" data-sk="tooltip_parent"&gt;Optimizing Materialized Views Recomputes&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;Hope this helps!&lt;SPAN class="c-message__edited_label" data-sk="tooltip_parent" aria-describedby="sk-tooltip-746"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 14 Jan 2026 08:27:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144008#M52247</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-01-14T08:27:37Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Serverless Pipelines - Incremental Refresh Doubts</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144249#M52293</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/204958"&gt;@Alf01&lt;/a&gt;&amp;nbsp;&amp;nbsp;, thanks for accepting the solution!&lt;/P&gt;
&lt;P&gt;To keep you updated, the REFRESH POLICY feature, that I mentioned in my post, is out now! It allows manual control of the refresh strategy (AUTO, INCREMENTAL, INCREMENTAL STRICT, FULL), just as you stated in your question.&lt;/P&gt;
&lt;P&gt;Check out the docs: &lt;A href="https://learn.microsoft.com/en-gb/azure/databricks/optimizations/incremental-refresh#refresh-policy" target="_self"&gt;Azure&lt;/A&gt; | &lt;A href="https://docs.databricks.com/aws/en/optimizations/incremental-refresh#refresh-policy" target="_self"&gt;AWS&lt;/A&gt; | &lt;A href="https://docs.databricks.com/gcp/en/optimizations/incremental-refresh#refresh-policy" target="_self"&gt;GCP&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2026 13:45:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144249#M52293</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-01-16T13:45:05Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Serverless Pipelines - Incremental Refresh Doubts</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144391#M52317</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102072"&gt;@aleksandra_ch&lt;/a&gt;, thank you very much for your response and for keeping me updated on the new control options.&lt;/P&gt;&lt;P&gt;I have one additional question after reviewing the different pieces of documentation.&lt;BR /&gt;Is it considered a best practice or standard approach to build pipelines using Spark SQL rather than PySpark?&lt;/P&gt;&lt;P&gt;I ask this because, when reading the documentation, features such as Liquid Clustering, partition management, and the control mechanism described in the page you shared are mostly documented and explained using SQL examples.&lt;/P&gt;&lt;P&gt;Thank you in advance for your clarification.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jan 2026 08:28:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144391#M52317</guid>
      <dc:creator>Alf01</dc:creator>
      <dc:date>2026-01-19T08:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks Serverless Pipelines - Incremental Refresh Doubts</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144582#M52342</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/204958"&gt;@Alf01&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Both SparkSQL and PySpark are valid approaches for developing Lakeflow Spark Declarative Pipelines. PySpark has some additional low-level APIs exposed, which are not (yet) available in SQL (for example,&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ldp/developer/ldp-python-ref-apply-changes-from-snapshot" target="_self"&gt;AUTO CDC FROM SNAPSHOT).&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Also, Python offers flexibility regarding unit testing and CI/CD practices (check out &lt;A href="https://www.databricks.com/blog/applying-software-development-devops-best-practices-delta-live-table-pipelines" target="_self"&gt;an excellent blog&lt;/A&gt; on this).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In Python, define the refresh policy with a &lt;STRONG&gt;refresh_policy&lt;/STRONG&gt; argument:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;@dp.table
# Options: "auto", "incremental", "incremental_strict", "full"
def sample_aggregation(refresh_policy="full"): 
    return (
        spark.read.table("sample_users")
        .withColumn("valid_email", utils.is_valid_email(col("email")))
        .groupBy(col("user_type"))
        .agg(
            count("user_id").alias("total_count"),
            count_if("valid_email").alias("count_valid_emails")
        )
    )&lt;/LI-CODE&gt;
&lt;P&gt;Other features that you cited (liquid clustering) are available on Python as well.&lt;/P&gt;
&lt;P&gt;Other than that, the choice between SQL or Python ultimately depends on your team's expertise and overall practices adopted in your organisation.&lt;/P&gt;
&lt;P&gt;Hope this helps!&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jan 2026 14:32:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-serverless-pipelines-incremental-refresh-doubts/m-p/144582#M52342</guid>
      <dc:creator>aleksandra_ch</dc:creator>
      <dc:date>2026-01-20T14:32:41Z</dc:date>
    </item>
  </channel>
</rss>

