Identifying Full Refresh vs. Incremental Runs in Delta Live Tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2025 10:34 PM
Hello Community,
I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable way to determine, within the pipeline's Python codebase, whether it was triggered as a full refresh or a normal incremental run.
My Requirements:
- Dynamic Identification: The solution should enable the code to dynamically identify the type of run (full refresh vs. incremental).
- Pipeline Configuration: Ideally, this should be achieved by configuring something within the DLT pipeline, such as a parameter or flag.
- Accessing the Configuration: The configuration should be accessible within the Python code during execution, allowing me to assign the information to variables for downstream logic.
My Questions:
- Is there an existing way in Databricks DLT to configure and identify the type of run?
- Can the run type (full refresh vs. incremental) be passed as a parameter or stored in a metadata table that the pipeline can read?
- Are there any best practices for handling such scenarios efficiently in DLT?
Any guidance, examples, or insights from your experience would be greatly appreciated.
Thank you in advance for your support!
- Labels:
-
Delta Lake
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2025 02:22 AM
Hello,
There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:
DLT Event Log Schema
The details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whether it is True or False.DLT Event Log Schema Documentation
An example of the details column is as follows:
{"user_action":{"action":"START","user_name":"xxxxxxx@gmail.com","user_id":xxxxxxxx,"request":{"start_request":{"full_refresh":false,"validate_only":false}}}}
Databricks REST API
You can retrieve DLT pipeline information using the Databricks REST API, which also contains the "full_refresh" field. Here, you can check whether it is True or False.Since you can invoke the Databricks REST API from Python, this might help you achieve what you’re aiming for.
I hope this helps!
Takuya Omi (尾美拓哉)

