Tracking DBMS CDC
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2024 06:49 AM
We're using Databricks to incrementally extract data from SQL Server tables into S3. The data contains a timestamp column. We need a place to store the maximum retrieved timestamp per table so it can retrieved during the next run.
Does Databricks contain (or easily connect to) any key-value stores, or have similar functionality? It could of course be tracked using a Delta Lake table but implementing a "frequent updates by primary key" pattern in a columnar storage system seems like a bad idea.