05-10-2025 09:38 AM
I'm having some issues with the direct publish mode when defining a DLT workflow that includes tables where their schema defines foreign key constraints. When the foreign constraints reference tables that are not directly defined in any joins of the table, it seems to not recognize it as a dependency and gives me an error stating the table or view does not exist:
```
org.apache.spark.sql.catalyst.analysis.NoSuchTableException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `sample_catalog`.`sales_gold`.`user` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01
```
I'm trying to use these constraints purely for informational/metadata purposes. Based on my reading, enforcement is not done with DLT materialized views which is fine for my use case. This seems to have worked a couple of weeks ago, but now it seems to be having a problem identifying this dependency, and it's trying to enforce the table's existence when they are running in parallel.
Any ideas on how I can approach this? The information is very nice to have especially with the ERD that it shows in the interface. I've tried the 'NOT ENFORCED' statement with the constraints to no avail.
05-10-2025 10:08 AM
This is a tricky situation where you want to leverage the metadata benefits (like the ERD visualization) without running into execution dependencies. Let me help you solve this issue.
The error suggests that DLT is trying to validate the foreign key relationships during pipeline execution, even though you want these constraints purely for metadata/documentation purposes. The NOT ENFORCED clause should work in theory, but it seems there are some limitations in the current implementation.
Solutions to Try
1. Use Development Mode First
spark.conf.set("spark.databricks.delta.properties.defaults.enableChangeDataFeed", "true")
2. Explicit Table Dependencies
Explicitly declare dependencies in your DLT definitions:
# Python
import dlt
from pyspark.sql.functions import *
@dlt.table(
name="orders",
comment="Orders table with FK to users",
table_properties={
"quality": "gold"
},
# Explicitly state dependencies
dependencies=["user"]
)
def orders():
# Your transformation logic
return spark.table("source_orders")
3. Use Schema Evolution Mode
Set your tables to use schema evolution mode, which can sometimes help with these types of dependency issues:
CREATE OR REFRESH LIVE TABLE orders
SCHEMA_EVOLUTION = RESCUE
COMMENT "Orders table with FK to users"
-- Rest of your definition
05-10-2025 11:47 AM
Hey @lingareddy_Alva ,
Thank you for your response and assistance.
1. The error pops up in my workflow that is running in development mode currently. The code provided seems to be enabling a CDC feed in the Spark configuration, is that equivalent to development mode? I'm deploying my code with Databricks Asset Bundles to a target that has development mode turned on. I tried adding the CDC feed configuration to the table definition, though the 'table does not exist' error remained.
2. I tried adding 'dependencies' to the DLT definition as provided, however this doesn't seem to be a valid parameter for the dlt.table decorator. I also don't see it mentioned in the documentation. Is this parameter for a particular version or channel that I need to configure? Here's the error it gives me:
```
TypeError: table() got an unexpected keyword argument 'dependencies'
```
3. I tried setting schema evolution as described, but I am still receiving the 'table does not exist' error.
I appreciate your help with this; do you have any other suggestions? I've been spending hours trying to get this to work, and I'm almost at the point of just giving up on the ERD feature for the time being.
05-10-2025 04:42 PM
You're right that the dependencies parameter isn't available for the dlt.table decorator - that was my mistake. Let's focus on approaches that will actually work in your environment.
1. Try View-Based Approach
Instead of using FK constraints directly, create the tables without constraints and use views to establish the dependencies:
# Define tables without FK constraints
@dlt.table(name="user")
def user():
return spark.table("source_users")
@dlt.table(name="orders")
def orders():
return spark.table("source_orders")
# Create a view that references both tables to establish dependency
@dlt.view(name="orders_with_users")
def orders_with_users():
return spark.sql("""
SELECT o.*, u.*
FROM LIVE.orders o
LEFT JOIN LIVE.user u ON o.user_id = u.id
""")
This ensures both tables are created before any attempt to join them.
2. Explicit Table Creation Order
Use expectations to control order of execution:
# First create the referenced table
@dlt.table(name="user")
def user():
return spark.table("source_users")
# Then create the referring table with an expectation that checks if the referenced table exists
@dlt.table(name="orders")
@dlt.expect_or_fail("referenced_table_exists", "1 = 1") # Always true but forces dependency check
def orders():
# Check if referenced table exists
user_exists = spark.sql("SELECT COUNT(*) > 0 FROM LIVE.user LIMIT 1").collect()[0][0]
if not user_exists:
raise Exception("Referenced table 'user' does not exist")
return spark.table("source_orders")
3. Two-Phase Approach
Split your pipeline into two phases:
# Phase 1: Create all tables without FK constraints
@dlt.table(name="user")
def user():
return spark.table("source_users")
@dlt.table(name="orders")
def orders():
return spark.table("source_orders")
# Phase 2: Add FK constraints via a metadata table that depends on all others
@dlt.table(name="metadata_updates")
def add_metadata():
# First verify all tables exist
tables_exist = spark.sql("""
SELECT
EXISTS (SELECT 1 FROM LIVE.user LIMIT 1) AND
EXISTS (SELECT 1 FROM LIVE.orders LIMIT 1) AS all_exist
""").collect()[0][0]
if tables_exist:
# Then apply FK constraints via SQL
try:
spark.sql("""
ALTER TABLE LIVE.orders
ADD CONSTRAINT fk_user_id
FOREIGN KEY (user_id) REFERENCES LIVE.user(id) NOT ENFORCED
""")
except Exception as e:
# Log but don't fail
print(f"Failed to add FK constraint: {str(e)}")
# Return a dummy dataframe so the table has data
return spark.createDataFrame([[1]], ["metadata_applied"])
4. Post-Pipeline Job
If none of the above works, consider a two-step process:
Run your DLT pipeline without FK constraints
Have a separate notebook job that runs after the DLT pipeline completes, which adds the constraints:
# In a separate notebook that runs after DLT
spark.sql("""
ALTER TABLE sample_catalog.sales_gold.orders
ADD CONSTRAINT fk_user_id
FOREIGN KEY (user_id) REFERENCES sample_catalog.sales_gold.user(id) NOT ENFORCED
""")
5. Use Comments Instead of FK Constraints
If the goal is documentation but not necessarily enforcing relationship:
@dlt.table(
name="orders",
comment="Orders table with FK: user_id references user(id)"
)
def orders():
return spark.table("source_orders")
Given the issues you're experiencing, you might need to consider if the ERD feature is worth the trouble right now. The underlying problem appears to be that DLT is checking for the existence of referenced tables before it has created them all, even with NOT ENFORCED constraints.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now