Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

Recommendation: use a business/effective timestamp in sequence_by if your source can emit late/backdated changes and you want SCD2 history to reflect source event time, not bronze arrival/commit time. If ties are possible, use a STRUCT for deterministic ordering, e.g. STRUCT(business_ts, _commit_timestamp). AUTO CDC uses SEQUENCE BY as the logical order of CDC events, handles out-of-order arrivals, and supports multi-column sequencing via STRUCT.

Options

  1. Keep _commit_timestamp only — good if bronze commit order is already the business order you want. Databricks docs do show this pattern for Delta CDF examples.
  2. Use business timestamp only — best if source events can arrive late and must be ordered by source/effective time, not ingestion time. AUTO CDC docs describe SEQUENCE BY as the logical order from the source data.
  3. Use STRUCT(business_ts, _commit_timestamp) — best balance in practice: business time decides history, commit time breaks ties deterministically.

Summary: _commit_timestamp alone is enough when “arrival order = desired history order.” Otherwise include business timestamp.

View solution in original post