cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Identifying Full Refresh vs. Incremental Runs in Delta Live Tables

yvishal519
Contributor

Hello Community,

I am working with a Delta Live Tables (DLT) pipeline that primarily operates in incremental mode. However, there are specific scenarios where I need to perform a full refresh of the pipeline. I am looking for an efficient and reliable way to determine, within the pipeline's Python codebase, whether it was triggered as a full refresh or a normal incremental run.

My Requirements:

  1. Dynamic Identification: The solution should enable the code to dynamically identify the type of run (full refresh vs. incremental).
  2. Pipeline Configuration: Ideally, this should be achieved by configuring something within the DLT pipeline, such as a parameter or flag.
  3. Accessing the Configuration: The configuration should be accessible within the Python code during execution, allowing me to assign the information to variables for downstream logic.

My Questions:

  • Is there an existing way in Databricks DLT to configure and identify the type of run?
  • Can the run type (full refresh vs. incremental) be passed as a parameter or stored in a metadata table that the pipeline can read?
  • Are there any best practices for handling such scenarios efficiently in DLT?

Any guidance, examples, or insights from your experience would be greatly appreciated.

Thank you in advance for your support!

1 REPLY 1

Takuya-Omi
Valued Contributor II

Hello,

There are two ways to determine whether a DLT pipeline is running in Full Refresh or Incremental mode:

  1. DLT Event Log Schema
    The details column in the DLT event log schema includes information on "full_refresh". You can use this to identify whether it is True or False.

    DLT Event Log Schema Documentation

    An example of the details column is as follows:

    {"user_action":{"action":"START","user_name":"xxxxxxx@gmail.com","user_id":xxxxxxxx,"request":{"start_request":{"full_refresh":false,"validate_only":false}}}}
  2. Databricks REST API
    You can retrieve DLT pipeline information using the Databricks REST API, which also contains the "full_refresh" field. Here, you can check whether it is True or False.

    Since you can invoke the Databricks REST API from Python, this might help you achieve what you’re aiming for.

    Databricks REST API Documentation - Get Pipeline Update

I hope this helps!

--------------------------
Takuya Omi (尾美拓哉)

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now