cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

AUTO CDC API and sequence column

Rjdudley
Honored Contributor

The docs for AUTO CDC API state

You must specify a column in the source data on which to sequence records, which Lakeflow Declarative Pipelines interprets as a monotonically increasing representation of the proper ordering of the source data.

Can this be something other than an integer, like a timestamp or UUID v7?

2 REPLIES 2

Rjdudley
Honored Contributor

OK later on the docs show a timestamp example (https://learn.microsoft.com/en-us/azure/databricks/dlt/cdc#use-multiple-columns-for-sequencing) but I'm still curious about a UUID v7

szymon_dybczak
Esteemed Contributor III

Hi @Rjdudley ,

They should also work because UUID v7 are generally monotonically increasing 
For example, this is excerpt from Postgre SQL implementation:

 

   * Generate UUID version 7 per RFC 9562, with the given timestamp.
     *
     * UUID version 7 consists of a Unix timestamp in milliseconds (48
     * bits) and 74 random bits, excluding the required version and
     * variant bits. To ensure monotonicity in scenarios of high-
     * frequency UUID generation, we employ the method "Replace
     * LeftmostRandom Bits with Increased Clock Precision (Method 3)",
     * described in the RFC. [โ€ฆ]

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now