AUTO CDC API and sequence column

Rjdudley
Honored Contributor

The docs for AUTO CDC API state

You must specify a column in the source data on which to sequence records, which Lakeflow Declarative Pipelines interprets as a monotonically increasing representation of the proper ordering of the source data.

Can this be something other than an integer, like a timestamp or UUID v7?

Rjdudley
Honored Contributor

OK later on the docs show a timestamp example (https://learn.microsoft.com/en-us/azure/databricks/dlt/cdc#use-multiple-columns-for-sequencing) but I'm still curious about a UUID v7

szymon_dybczak
Esteemed Contributor III

Hi @Rjdudley ,

They should also work because UUID v7 are generally monotonically increasing 
For example, this is excerpt from Postgre SQL implementation:

 

   * Generate UUID version 7 per RFC 9562, with the given timestamp.
     *
     * UUID version 7 consists of a Unix timestamp in milliseconds (48
     * bits) and 74 random bits, excluding the required version and
     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. […]

 

View solution in original post

Rjdudley
Honored Contributor

Thanks Szymon, I'm familiar with the Postgre SQL implementation and was hoping Databricks would behave the same.