What does DLT INITIALIZING phase do?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2024 03:01 AM
In Delta Live Tables, the INITIALIZING phase takes sometimes a minute, sometimes 5 minutes. I'd like to learn what is it doing in the background, and can it be optimized in any way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-09-2024 04:23 AM - edited 10-09-2024 04:27 AM
Hi @antr,
In Delta Live Tables (DLT), a feature of Databricks, the "Initializing" phase refers to the first step in the lifecycle of a DLT pipeline run. During this phase, the platform sets up the necessary resources, configurations, and dependencies required to execute the pipeline. So, generally speaking following will happen
- Setting up all table: DLT will setup all required tables for you if they don't exist yet
- Dependency Resolution: DLT resolves any dependencies, such as libraries or packages needed for the transformations defined in the pipeline.
Data Flow and Schema Validation: DLT validates the data flows and checks the schema and lineage definitions specified in the pipeline. This includes any constraints or quality checks defined in the pipeline configuration.
Execution Plan Preparation: An execution plan is prepared based on the pipeline’s DAG (Directed Acyclic Graph) of transformations and tables. This outlines the sequence in which data processing steps will be executed.
After the "Initializing" phase, the pipeline moves into the "Running" phase, where the actual data processing and transformation occur. If there are any issues during initialization (such as missing dependencies or configuration errors), the pipeline may fail at this stage, requiring attention to resolve those issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 12:37 AM
Thanks for the answer. I'm a bit confused as SETTING_UP_TABLES is a separate phase after INITIALIZING, and you say "Setting up all tables" is part of initialization. Could you elaborate how they differ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 07:37 AM
Yeah, sorry. I wrote from memory, so I might have mixed this up. The point is, preparing the pipeline for execution takes some time, and the more complex it is, the longer it takes. And since DLT framework is not open sourced, we can't be 100% sure what happens at each stage of preparation phase.

