โ09-10-2025 06:20 AM
The docs for AUTO CDC API state
You must specify a column in the source data on which to sequence records, which Lakeflow Declarative Pipelines interprets as a monotonically increasing representation of the proper ordering of the source data.
Can this be something other than an integer, like a timestamp or UUID v7?
โ09-10-2025 06:53 AM
Hi @Rjdudley ,
They should also work because UUID v7 are generally monotonically increasing
For example, this is excerpt from Postgre SQL implementation:
* Generate UUID version 7 per RFC 9562, with the given timestamp.
*
* UUID version 7 consists of a Unix timestamp in milliseconds (48
* bits) and 74 random bits, excluding the required version and
* variant bits. To ensure monotonicity in scenarios of high-
* frequency UUID generation, we employ the method "Replace
* LeftmostRandom Bits with Increased Clock Precision (Method 3)",
* described in the RFC. [โฆ]
โ09-10-2025 06:37 AM
OK later on the docs show a timestamp example (https://learn.microsoft.com/en-us/azure/databricks/dlt/cdc#use-multiple-columns-for-sequencing) but I'm still curious about a UUID v7
โ09-10-2025 06:53 AM
Hi @Rjdudley ,
They should also work because UUID v7 are generally monotonically increasing
For example, this is excerpt from Postgre SQL implementation:
* Generate UUID version 7 per RFC 9562, with the given timestamp.
*
* UUID version 7 consists of a Unix timestamp in milliseconds (48
* bits) and 74 random bits, excluding the required version and
* variant bits. To ensure monotonicity in scenarios of high-
* frequency UUID generation, we employ the method "Replace
* LeftmostRandom Bits with Increased Clock Precision (Method 3)",
* described in the RFC. [โฆ]
a month ago
Thanks Szymon, I'm familiar with the Postgre SQL implementation and was hoping Databricks would behave the same.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now