topic Identifying Full Refresh vs. Incremental Runs in Delta Live Tables in Data Engineering

Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

yvishal519 — Thu, 23 Jan 2025 06:34:57 GMT

Hello Community,

I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable way to determine, within the pipeline's Python codebase, whether it was triggered as a full refresh or a normal incremental run.

My Requirements:

Dynamic Identification: The solution should enable the code to dynamically identify the type of run (full refresh vs. incremental).
Pipeline Configuration: Ideally, this should be achieved by configuring something within the DLT pipeline, such as a parameter or flag.
Accessing the Configuration: The configuration should be accessible within the Python code during execution, allowing me to assign the information to variables for downstream logic.

My Questions:

Is there an existing way in Databricks DLT to configure and identify the type of run?
Can the run type (full refresh vs. incremental) be passed as a parameter or stored in a metadata table that the pipeline can read?
Are there any best practices for handling such scenarios efficiently in DLT?

Any guidance, examples, or insights from your experience would be greatly appreciated.

Thank you in advance for your support!

Re: Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

Takuya-Omi — Sat, 25 Jan 2025 10:22:13 GMT

Hello,

There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:

DLT Event Log Schema
The details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whether it is True or False.
DLT Event Log Schema Documentation
An example of the details column is as follows:
```
{"user_action":{"action":"START","user_name":"xxxxxxx@gmail.com","user_id":xxxxxxxx,"request":{"start_request":{"full_refresh":false,"validate_only":false}}}}
```
Databricks REST API
You can retrieve DLT pipeline information using the Databricks REST API, which also contains the "full_refresh" field. Here, you can check whether it is True or False.
Since you can invoke the Databricks REST API from Python, this might help you achieve what you’re aiming for.
Databricks REST API Documentation - Get Pipeline Update

I hope this helps!

Re: Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

km1837 — Wed, 10 Sep 2025 14:13:05 GMT

How do I get that into the notebook. When I click on Full refresh , I want a particular column in pipeline table to capture that saying "Full Refresh on <timestamp>.