cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

yvishal519
Contributor

Hello Community,

I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable way to determine, within the pipeline's Python codebase, whether it was triggered as a full refresh or a normal incremental run.

My Requirements:

  1. Dynamic Identification: The solution should enable the code to dynamically identify the type of run (full refresh vs. incremental).
  2. Pipeline Configuration: Ideally, this should be achieved by configuring something within the DLT pipeline, such as a parameter or flag.
  3. Accessing the Configuration: The configuration should be accessible within the Python code during execution, allowing me to assign the information to variables for downstream logic.

My Questions:

  • Is there an existing way in Databricks DLT to configure and identify the type of run?
  • Can the run type (full refresh vs. incremental) be passed as a parameter or stored in a metadata table that the pipeline can read?
  • Are there any best practices for handling such scenarios efficiently in DLT?

Any guidance, examples, or insights from your experience would be greatly appreciated.

Thank you in advance for your support!

1 REPLY 1

TakuyaOmi
Valued Contributor II

Hello,

There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:

  1. DLT Event Log Schema
    The details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whether it is True or False.

    DLT Event Log Schema Documentation

    An example of the details column is as follows:

    {"user_action":{"action":"START","user_name":"xxxxxxx@gmail.com","user_id":xxxxxxxx,"request":{"start_request":{"full_refresh":false,"validate_only":false}}}}
  2. Databricks REST API
    You can retrieve DLT pipeline information using the Databricks REST API, which also contains the "full_refresh" field. Here, you can check whether it is True or False.

    Since you can invoke the Databricks REST API from Python, this might help you achieve what you’re aiming for.

    Databricks REST API Documentation - Get Pipeline Update

I hope this helps!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group