Databricks Community

yit337 · 2 weeks ago

I use Lakeflow Connect to load data from my source databases to bronze tables. Then I have auto_cdc to track SCD2 changes in my silver tables. I use _commit_timestamp from the bronze CDF, as sequence_by property in auto_cdc in order to order the version of records. Is that enough, or shall I include business timestamp column to handle out-of-order events?

Lu_Wang_ENB_DBX · 2 weeks ago

Recommendation: use a business/effective timestamp in sequence_by if your source can emit late/backdated changes and you want SCD2 history to reflect source event time, not bronze arrival/commit time. If ties are possible, use a STRUCT for deterministic ordering, e.g. STRUCT(business_ts, _commit_timestamp). AUTO CDC uses SEQUENCE BY as the logical order of CDC events, handles out-of-order arrivals, and supports multi-column sequencing via STRUCT.

Options

Keep _commit_timestamp only — good if bronze commit order is already the business order you want. Databricks docs do show this pattern for Delta CDF examples.
Use business timestamp only — best if source events can arrive late and must be ordered by source/effective time, not ingestion time. AUTO CDC docs describe SEQUENCE BY as the logical order from the source data.
Use STRUCT(business_ts, _commit_timestamp) — best balance in practice: business time decides history, commit time breaks ties deterministically.

Summary: _commit_timestamp alone is enough when “arrival order = desired history order.” Otherwise include business timestamp.

View solution in original post

pradeep_singh · 2 weeks ago

To process late arriving data correctly you would need a business column that identifies when the data was created/updated at the source and have that as the first column in your struct of sequence_by columns .

A right sequence_by column/columns must be a monotonically increasing representation of the correct event order, with one distinct update per key at each sequencing value ._commit_timestamp alone is only enough if bronze commit time is the exact event order you want to preserve.

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

Lu_Wang_ENB_DBX · 2 weeks ago

Recommendation: use a business/effective timestamp in sequence_by if your source can emit late/backdated changes and you want SCD2 history to reflect source event time, not bronze arrival/commit time. If ties are possible, use a STRUCT for deterministic ordering, e.g. STRUCT(business_ts, _commit_timestamp). AUTO CDC uses SEQUENCE BY as the logical order of CDC events, handles out-of-order arrivals, and supports multi-column sequencing via STRUCT.

Options

Keep _commit_timestamp only — good if bronze commit order is already the business order you want. Databricks docs do show this pattern for Delta CDF examples.
Use business timestamp only — best if source events can arrive late and must be ordered by source/effective time, not ingestion time. AUTO CDC docs describe SEQUENCE BY as the logical order from the source data.
Use STRUCT(business_ts, _commit_timestamp) — best balance in practice: business time decides history, commit time breaks ties deterministically.

Summary: _commit_timestamp alone is enough when “arrival order = desired history order.” Otherwise include business timestamp.

Databricks Community

Does Lakeflow Connect guarantee no out-of-order records?

DAIS 2026 Speaker Spotlight Series #7 | Tushar Madan

🌟 Community Pulse: Your Weekly Roundup! May 11 – 17, 2026

Databricks Community Champion - May 2026 - Balaji J

Solution Accelerator Series | Media Mix Modeling (MMM)

DAIS 2026 | Community Virtual Contest – Showcase Your Skills & Win Exclusive Swag