Databricks Community

Malthe · ‎08-15-2025

When defining a streaming tables using DLT (declarative pipelines), we can provide a schema which lets us define primary and foreign key constraints.

However, references to self, i.e. the defining table, are not currently allowed (you get a "table not found" error.)

Since with DLT, you're not allowed to alter tables created through the framework, there's no way to define a self-referential constraint, i.e. for nested hierarchies, for streaming tables.

WiliamRosa · ‎08-15-2025

Currently, Delta Live Tables (DLT) does not support defining self-referential constraints (e.g., a foreign key pointing back to the same streaming table) at creation time, and because DLT-managed tables are immutable in terms of schema evolution through ALTER TABLE, there’s no supported way to add such constraints later. For hierarchical or parent-child relationships within the same entity, the common workaround is to enforce the relationship at the data-processing layer—either by implementing validation logic in your transformation code or by creating an intermediate (Silver) table that performs self-joins or integrity checks before writing to the final (Gold) table. This preserves referential integrity logically, even though the constraint is not physically declared in the table metadata.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Malthe · ‎08-15-2025

The reasons we're interested in having the foreign key relations defined are two-fold:

It serves as documentation for human users.
It enables the AI (Genie) to better assist with writing queries.

WiliamRosa · ‎08-15-2025

I see your point — having the foreign key definition directly in the table schema would indeed serve as valuable documentation and improve the ability of AI assistants like Genie to reason about joins and relationships. Since DLT currently doesn’t allow self-referential constraints, one potential workaround to preserve those benefits is to maintain a “data contract” or schema definition file (YAML/JSON) that includes these logical relationships, even if they can’t be physically enforced. This file can live alongside your pipeline code, be version-controlled, and serve both as human-readable documentation and as a source for tooling/AI prompts. Another option is to create a lightweight metadata table in Unity Catalog that lists entity relationships — including self-references — so it’s queryable and can be leveraged by Genie or other assistants when generating SQL. While this doesn’t enforce the constraint in the storage layer, it still provides the semantic context you’re after.

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Malthe · ‎08-15-2025

Each of these workarounds give up the optimizations that are enabled by the use of key constraints.

ismaelhenzel · 2 weeks ago

I totally agree. It’s a mess that DLT can't handle self-referential constraints. I’ve had to create some ugly workaround functions to get past this—like a function that tries to read the referenced table and returns an empty string if it throws an error; otherwise, it adds the constraint to the materialized view schema.

I really think this should be supported in declarative pipelines—something like add constraint fk IF table x EXISTS. Or, at the very least, allow ALTER MATERIALIZED VIEW to add constraints in a separate step.

DLT has some awesome features, like incremental processing in materialized views, but it misses details like this that make a huge difference when planning a migration.

Databricks Community

Self-referential foreign key constraint for streaming tables

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Celebrating Our First Brickster Champion: Louis Frolio