mark_ott
Databricks Employee
Databricks Employee

The issue you're experiencing is a common challenge in Delta Live Tables (DLT) when implementing mixed refresh patterns (weekly full refresh + daily incremental updates) with schema evolution. The "__enzyme__row__id" column and schema mismatch errors are indicators of DLT's internal tracking mechanisms conflicting with your conditional logic.

Root Cause Analysis

The core problem stems from DLT's expectation of consistent schema across pipeline runs. When you switch between full catch-up mode (with is_deleted flag logic) and normal incremental mode, you're essentially creating two different schemas:​

  1. Full catch-up mode: Includes additional columns and logic for deleted record detection

  2. Normal mode: Standard upsert without deleted record tracking

The "__enzyme__row__id" column is DLT's internal row tracking mechanism , and schema mismatches occur when DLT expects consistent column structures between runs.​

Recommended Solutions

Solution 1: Consistent Schema Approach (Recommended)

Always include the is_deleted column in your silver table schema, regardless of the refresh type:

python
@Dlt.table def silver_table(): df = dlt.read("bronze_table") # Always add is_deleted column, defaulting to False for normal runs if not spark.conf.get("delete_missing_rows", "false").lower() == "true": df = df.withColumn("is_deleted", lit(False)) else: # Your existing full catch-up logic here pass return df

Solution 2: Separate Tables Strategy

Create separate silver tables for different refresh patterns:

  • silver_table_incremental: For daily updates

  • silver_table_full: For weekly full refreshes with delete tracking

Then use a view or downstream table to union them appropriately.​

Solution 3: Schema Evolution Configuration

Enable automatic schema evolution in your DLT pipeline configuration:

python
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

However, use this cautiously as it can mask unintended schema changes.​

Best Practices for Your Use Case

Weekly Full Refresh Strategy

  1. Use DLT's built-in full refresh capability rather than custom logic​

  2. Schedule separate pipelines for full vs incremental refreshes

  3. Leverage Change Data Feed (CDF) for deleted record tracking​

Handling Deleted Records

Consider implementing a more robust deleted record detection strategy:

python
@Dlt.table def silver_table_with_deletes(): current_data = dlt.read("bronze_latest") if is_full_refresh_run(): # Compare with previous state previous_data = dlt.read("bronze_previous") deleted_records = detect_deleted_records(previous_data, current_data) return current_data.union(deleted_records.withColumn("is_deleted", lit(True))) else: return current_data.withColumn("is_deleted", lit(False))

Implementation Recommendations

Option A: Redesign with Consistent Schema

  • Always include is_deleted column in silver table

  • Use pipeline parameters to control delete detection logic

  • Maintain schema consistency across all runs

Option B: Use DLT's Native Capabilities

  • Leverage apply_changes with apply_as_deletes parameter​

  • Enable Change Data Feed on source tables

  • Use SCD Type 1 for complete record removal

Option C: Dual Pipeline Approach

  • Separate pipeline for weekly full refresh with delete detection

  • Daily incremental pipeline for standard upserts

  • Downstream merge process to combine results

Avoiding Full Refresh Requirements

To prevent the need for full refreshes due to schema changes:

  1. Define complete schema upfront including optional columns

  2. Use schema evolution settings appropriate for your use case​

  3. Test schema changes in development before production deployment

  4. Consider using streaming tables instead of materialized views for more flexibility​

The most sustainable approach would be Option A (consistent schema) combined with DLT's native delete handling capabilities, as it maintains schema consistency while providing the flexibility you need for both refresh patterns.

View solution in original post