cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What does DLT INITIALIZING phase do?

antr
New Contributor II

In Delta Live Tables, the INITIALIZING phase takes sometimes a minute, sometimes 5 minutes. I'd like to learn what is it doing in the background, and can it be optimized in any way.

3 REPLIES 3

szymon_dybczak
Contributor III

Hi @antr,

 

In Delta Live Tables (DLT), a feature of Databricks, the "Initializing" phase refers to the first step in the lifecycle of a DLT pipeline run. During this phase, the platform sets up the necessary resources, configurations, and dependencies required to execute the pipeline. So, generally speaking following will happen 

  1. Setting up all table: DLT will setup all required tables for you if they don't exist yet
  2. Dependency Resolution: DLT resolves any dependencies, such as libraries or packages needed for the transformations defined in the pipeline.
  3. Data Flow and Schema Validation: DLT validates the data flows and checks the schema and lineage definitions specified in the pipeline. This includes any constraints or quality checks defined in the pipeline configuration.

  4. Execution Plan Preparation: An execution plan is prepared based on the pipelineโ€™s DAG (Directed Acyclic Graph) of transformations and tables. This outlines the sequence in which data processing steps will be executed.

After the "Initializing" phase, the pipeline moves into the "Running" phase, where the actual data processing and transformation occur. If there are any issues during initialization (such as missing dependencies or configuration errors), the pipeline may fail at this stage, requiring attention to resolve those issues.

antr
New Contributor II

Thanks for the answer. I'm a bit confused as SETTING_UP_TABLES is a separate phase after INITIALIZING, and you say "Setting up all tables" is part of initialization. Could you elaborate how they differ?

Yeah, sorry. I wrote from memory, so I might have mixed this up. The point is, preparing the pipeline for execution takes some time, and the more complex it is, the longer it takes. And since DLT framework is not open sourced, we can't be 100% sure what happens at each stage of preparation phase.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group