cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

DLT Flow Failed Due to Missing Flow Checkpoints Directory When Using Unity Catalog

minhhung0507
Valued Contributor

I’m encountering an issue while running a Delta Live Tables (DLT) pipeline that is managed using Unity Catalog on Databricks. The pipeline has failed and is not restarting, showing the following error:

java.lang.IllegalArgumentException: flow checkpoints directory is not defined. Please contact Databricks support.
Stack trace snippet:

at com.databricks.pipelines.execution.core.FlowSystemMetadata.$anonfun$flowCheckpointsDir$1...
at com.databricks.pipelines.execution.core.FlowPlanner.plan...
at com.databricks.pipelines.execution.core.GraphExecution.$anonfun$startFlow$1...
...

Context:

  • I'm managing the DLT pipeline with Unity Catalog.

  • The flow that failed is: uat4_gold.lakehouse.gold_customerinfo

  • I'm using the new Databricks UI for monitoring pipeline runs.

  • The error says that the "flow checkpoints directory is not defined", but I didn’t explicitly set any custom checkpoint directory (expecting defaults to work).

  • This issue prevents the pipeline from even starting execution.

My Questions:

  1. What is the root cause of this flow checkpoints directory is not defined error in the context of Unity Catalog and DLT?

  2. How can I correctly configure the checkpoint location for DLT flows under Unity Catalog? Is there a setting I might be missing?

  3. Is this a known issue with Unity Catalog and DLT pipelines (possibly a bug or configuration oversight)?

  4. What are the best practices for setting up checkpointing in DLT pipelines when using Unity Catalog?

Any help or pointers (including relevant documentation) would be greatly appreciated.

Thanks in advance!

Regards,
Hung Nguyen
3 REPLIES 3

minhhung0507
Valued Contributor

image.png

Regards,
Hung Nguyen

mark_ott
Databricks Employee
Databricks Employee

The root cause of the "flow checkpoints directory is not defined" error in a Delta Live Tables (DLT) pipeline managed using Unity Catalog on Databricks is typically due to how checkpoint directories are set and managed in Unity Catalog-enabled DLT pipelines. Unlike pipelines managed by the Hive Metastore, DLT pipelines managed by Unity Catalog do not use a separate storage location for internal files (such as checkpoints); instead, these files are expected to be stored within the managed location of the target table defined by Unity Catalog. If this location is missing, misconfigured, or not properly recognized by Databricks, the pipeline cannot initialize the checkpoint and fails with the reported error.​

Root Cause Details

  • In Unity Catalog-enabled DLT pipelines, checkpoint directories are managed internally and stored within the managed location of the target table, not a custom path or external storage unless explicitly supported by configuration.​

  • This error generally arises if:

    • The target table's managed location is not properly set or accessible.

    • The Unity Catalog schema has an unsupported or missing storage configuration.

    • There is a bug or temporary misconfiguration in Databricks that fails to recognize the default checkpoint path, especially in environments using new or preview features.

Common Triggers

  • Attempting to specify a custom checkpoint or storage location for a managed DLT table in Unity Catalog, which is not supported and may cause pipeline initialization to fail.​

  • Schema or catalog migrations, or upgrades that leave certain metadata (including storage location pointers) in an inconsistent state.

  • Renaming flows or tables in pipelines without carrying over checkpoint metadata, causing DLT to lose track of where internal files should be stored.​

Best Practices & Resolutions

  • Do not set a custom storage or checkpoint directory manually for Unity Catalog managed DLT pipelines; rely on Databricks defaults and ensure the target Unity Catalog table/schema is fully configured and accessible.​

  • Check that Unity Catalog schema is using a valid managed location and that compute resources have the necessary permissions to access this location.​

  • For troubleshooting:

    • Verify the existence and accessibility of the table location in Unity Catalog.

    • Use Databricks REST APIs or SQL commands to inspect the table’s storage details if the UI does not show it directly.​

    • If the pipeline previously worked and suddenly stopped, review recent changes in catalog or schema configurations.

  • If the managed location or checkpoint files are missing or corrupted, regenerating the table or its checkpoint directory using Databricks commands may resolve the issue.​

Known Limitations

  • DLT with Unity Catalog support may still have preview/beta limitations in certain environments, and may not support all combinations of table/storage settings.​

  • External tables with explicitly defined storage locations often don't work with Unity Catalog DLT pipelines.​

If these checks and fixes do not resolve the error, contacting Databricks support is recommended, as there may be a backend configuration or product bug involved.

mark_ott
Databricks Employee
Databricks Employee

The best practices for setting up checkpointing in Delta Live Tables (DLT) pipelines when using Unity Catalog are largely centered on leveraging Databricks' managed services, adhering to Unity Catalog's table management conventions, and minimizing the need for manual checkpoint directory configuration. DLT and Unity Catalog work together to abstract away most of the complexity around checkpoint storage, provided common configuration principles are observed.​

Use Managed Tables and Default Locations

  • DLT pipelines with Unity Catalog always use managed tables; Databricks internally manages the checkpoint and metadata directories in the managed storage location associated with the catalog and schema.​

  • Manual configuration of checkpoint locations is not required or generally supported. Rely on the defaults unless there is a specialized need.

Unity Catalog & Metadata Storage Best Practices

  • Always deploy DLT pipelines into Unity Catalog schemas that have a valid managed location defined in your cloud storage provider.​

  • Do not attempt to specify custom checkpoint paths or table locations, as this can lead to errors or lack of support in Unity Catalog-enabled pipelines.​

  • Use ā€œdescribe extendedā€ or Unity Catalog's catalog viewer to inspect the actual storage path if needed.​

Permissions and Access Control

  • Ensure that all compute clusters and resources running the pipeline have appropriate access to the managed storage location defined for the target schema.​

  • Confirm that users and service principals have sufficient privileges to create/modify tables and materialized views in the target Unity Catalog locations.​

Streaming Table & Checkpoint Handling

  • For streaming tables, DLT internally manages checkpointing to track incremental progress, ensuring exactly-once processing semantics. No extra setup is necessary for checkpointing, even in triggered mode.​

  • For additional raw file ingestion or hybrid scenarios, create volumes within the same Unity Catalog schema for ingesting/handling files, but keep DLT datasets themselves managed by the platform.​

  • If you need to manually handle checkpoints for external streaming sources, use Unity Catalog Volumes for checkpoint directory paths, but this is only for advanced cases.​

Cleanup and Lifecycle

  • Lifecycle of checkpoint folders is managed automatically when you drop a table or modify pipeline definitions. If you use any volumes for streaming checkpoints outside DLT tables, manual cleanup may be necessary when dropping those tables.