Databricks Community

antr · ‎10-09-2024

In Delta Live Tables, the INITIALIZING phase takes sometimes a minute, sometimes 5 minutes. I'd like to learn what is it doing in the background, and can it be optimized in any way.

szymon_dybczak · ‎10-09-2024

Hi @antr,

In Delta Live Tables (DLT), a feature of Databricks, the "Initializing" phase refers to the first step in the lifecycle of a DLT pipeline run. During this phase, the platform sets up the necessary resources, configurations, and dependencies required to execute the pipeline. So, generally speaking following will happen

Setting up all table: DLT will setup all required tables for you if they don't exist yet
Dependency Resolution: DLT resolves any dependencies, such as libraries or packages needed for the transformations defined in the pipeline.
Data Flow and Schema Validation: DLT validates the data flows and checks the schema and lineage definitions specified in the pipeline. This includes any constraints or quality checks defined in the pipeline configuration.
Execution Plan Preparation: An execution plan is prepared based on the pipeline’s DAG (Directed Acyclic Graph) of transformations and tables. This outlines the sequence in which data processing steps will be executed.

After the "Initializing" phase, the pipeline moves into the "Running" phase, where the actual data processing and transformation occur. If there are any issues during initialization (such as missing dependencies or configuration errors), the pipeline may fail at this stage, requiring attention to resolve those issues.

antr · ‎10-10-2024

Thanks for the answer. I'm a bit confused as SETTING_UP_TABLES is a separate phase after INITIALIZING, and you say "Setting up all tables" is part of initialization. Could you elaborate how they differ?

szymon_dybczak · ‎10-10-2024

Yeah, sorry. I wrote from memory, so I might have mixed this up. The point is, preparing the pipeline for execution takes some time, and the more complex it is, the longer it takes. And since DLT framework is not open sourced, we can't be 100% sure what happens at each stage of preparation phase.

Databricks Community

What does DLT INITIALIZING phase do?

Connect with Databricks Users in Your Area

Introducing an exclusively Databricks-hosted Assistant

How to present and share your Notebook insights in AI/BI Dashboards

Meet the Databricks MVPs

Now Hiring: Databricks Community Technical Moderator

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs