cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Self-referential foreign key constraint for streaming tables

Malthe
Contributor II

When defining a streaming tables using DLT (declarative pipelines), we can provide a schema which lets us define primary and foreign key constraints.

However, references to self, i.e. the defining table, are not currently allowed (you get a "table not found" error.)

Since with DLT, you're not allowed to alter tables created through the framework, there's no way to define a self-referential constraint, i.e. for nested hierarchies, for streaming tables.

4 REPLIES 4

WiliamRosa
Contributor

Currently, Delta Live Tables (DLT) does not support defining self-referential constraints (e.g., a foreign key pointing back to the same streaming table) at creation time, and because DLT-managed tables are immutable in terms of schema evolution through ALTER TABLE, thereโ€™s no supported way to add such constraints later. For hierarchical or parent-child relationships within the same entity, the common workaround is to enforce the relationship at the data-processing layerโ€”either by implementing validation logic in your transformation code or by creating an intermediate (Silver) table that performs self-joins or integrity checks before writing to the final (Gold) table. This preserves referential integrity logically, even though the constraint is not physically declared in the table metadata.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Malthe
Contributor II

The reasons we're interested in having the foreign key relations defined are two-fold:

  1. It serves as documentation for human users.
  2. It enables the AI (Genie) to better assist with writing queries.

 

WiliamRosa
Contributor

I see your point โ€” having the foreign key definition directly in the table schema would indeed serve as valuable documentation and improve the ability of AI assistants like Genie to reason about joins and relationships. Since DLT currently doesnโ€™t allow self-referential constraints, one potential workaround to preserve those benefits is to maintain a โ€œdata contractโ€ or schema definition file (YAML/JSON) that includes these logical relationships, even if they canโ€™t be physically enforced. This file can live alongside your pipeline code, be version-controlled, and serve both as human-readable documentation and as a source for tooling/AI prompts. Another option is to create a lightweight metadata table in Unity Catalog that lists entity relationships โ€” including self-references โ€” so itโ€™s queryable and can be leveraged by Genie or other assistants when generating SQL. While this doesnโ€™t enforce the constraint in the storage layer, it still provides the semantic context youโ€™re after.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Malthe
Contributor II

Each of these workarounds give up the optimizations that are enabled by the use of key constraints.