cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Tracking DBMS CDC

demost11
New Contributor II

We're using Databricks to incrementally extract data from SQL Server tables into S3. The data contains a timestamp column. We need a place to store the maximum retrieved timestamp per table so it can retrieved during the next run.

Does Databricks contain (or easily connect to) any key-value stores, or have similar functionality? It could of course be tracked using a Delta Lake table but implementing a "frequent updates by primary key" pattern in a columnar storage system seems like a bad idea.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @demost11, Databricks provides several options for managing state or storing metadata. 

 

Databricks Secrets:

  • Databricks allows you to create and manage secrets, which are key-value pairs storing secret material (such as API keys, passwords, etc.). Each secret scope can hold up to 1000 secrets, and the maximum allowed secret value size is 128 KB.
  • You can create a secret scope and store your maximum retrieved timestamp as a secret value. Then, during subsequent runs, retrieve the timestamp from the secret scope.
  • Here’s how you can create a secret scope and store a secret key:# Create a secret scope (if not already created) databricks secrets create-scope --scope my_scope # Store the timestamp as a secret databricks secrets put --scope my_scope --key max_timestamp --string-value "2024-03-06T11:44:58"
  • Later, you can read the secret value using the same scope and key.

Custom Metadata Table:

  • You mentioned that using a Delta Lake table might not be ideal due to frequent updates by the primary key. However, consider creating a separate metadata table specifically for tracking the maximum retrieved timestamp.
  • This table can have a simple structure with columns like table_name, max_timestamp, and any other relevant information.
  • During each extraction, update this metadata table with the latest timestamp for each table.
  • While this approach involves maintaining an additional table, it provides flexibility and ease of querying.

External Key-Value Stores:

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @demost11, Databricks provides several options for managing state or storing metadata. 

 

Databricks Secrets:

  • Databricks allows you to create and manage secrets, which are key-value pairs storing secret material (such as API keys, passwords, etc.). Each secret scope can hold up to 1000 secrets, and the maximum allowed secret value size is 128 KB.
  • You can create a secret scope and store your maximum retrieved timestamp as a secret value. Then, during subsequent runs, retrieve the timestamp from the secret scope.
  • Here’s how you can create a secret scope and store a secret key:# Create a secret scope (if not already created) databricks secrets create-scope --scope my_scope # Store the timestamp as a secret databricks secrets put --scope my_scope --key max_timestamp --string-value "2024-03-06T11:44:58"
  • Later, you can read the secret value using the same scope and key.

Custom Metadata Table:

  • You mentioned that using a Delta Lake table might not be ideal due to frequent updates by the primary key. However, consider creating a separate metadata table specifically for tracking the maximum retrieved timestamp.
  • This table can have a simple structure with columns like table_name, max_timestamp, and any other relevant information.
  • During each extraction, update this metadata table with the latest timestamp for each table.
  • While this approach involves maintaining an additional table, it provides flexibility and ease of querying.

External Key-Value Stores:

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.