cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Row tracking in Delta tables

analyticsnerd
New Contributor II

What exactly is row tracking and why should we use it for our delta tables? Could you explain with an example how it works internally and is it mandatory to use?

 

1 ACCEPTED SOLUTION

Accepted Solutions

K_Anudeep
Databricks Employee
Databricks Employee

Hello @analyticsnerd ,

Enabling row tracking allows us to track row-level lineage in a Delta table across multiple versions. When enabled, delta creates/exposes two hidden metadata columns which can be accessed as _metadata.row_id and _metadata.row_commit_version

Row ID, which uniquely identifies a row across multiple versions of a table, and Row Commit Version, which makes it possible to check whether two rows with the same ID in two different versions of the table represent the same version of the row.

In simple terms, think of row tracking as giving every row in your Delta table a stable primary key + last_modified_version, managed by Delta itself. That’s mainly useful whenever you want incremental, row-level propagation of changes downstream.

 

Example:

  • Suppose you want to incrementally refresh the MV. To understand which exact base rows have changed since the last refresh, without row tracking enabled, it would be very difficult to determine.
  • Also, when you have a pipeline with Meddalion architecture, you can easily determine if a specific row in the gold table came from a row (row_id X) in the bronze table,” – which helps with audits and debugging data issues.
Anudeep

View solution in original post

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @analyticsnerd ,

Delta Lake row tracking allows Azure Databricks to track row-level lineage in a Delta table.

Row tracking (delta.enableRowTracking) in Delta Lake assigns a stable unique ID to each row and preserves metadata about when each row was last committed. It simplifies identifying row-level changes, creating faster-materialized views, auditing data, and more

Here are some benefits you can get when using this feature

szymon_dybczak_0-1764945313422.png

source: Row Lineage: Performance and Auditing Benefit | by Hubert Dudek | Medium

K_Anudeep
Databricks Employee
Databricks Employee

Hello @analyticsnerd ,

Enabling row tracking allows us to track row-level lineage in a Delta table across multiple versions. When enabled, delta creates/exposes two hidden metadata columns which can be accessed as _metadata.row_id and _metadata.row_commit_version

Row ID, which uniquely identifies a row across multiple versions of a table, and Row Commit Version, which makes it possible to check whether two rows with the same ID in two different versions of the table represent the same version of the row.

In simple terms, think of row tracking as giving every row in your Delta table a stable primary key + last_modified_version, managed by Delta itself. That’s mainly useful whenever you want incremental, row-level propagation of changes downstream.

 

Example:

  • Suppose you want to incrementally refresh the MV. To understand which exact base rows have changed since the last refresh, without row tracking enabled, it would be very difficult to determine.
  • Also, when you have a pipeline with Meddalion architecture, you can easily determine if a specific row in the gold table came from a row (row_id X) in the bronze table,” – which helps with audits and debugging data issues.
Anudeep

Thanks @K_Anudeep @szymon_dybczak ,

So is this mandatory to have i we have MVs correct?

 

szymon_dybczak
Esteemed Contributor III

Yes, to support incremental refresh in materialized views, source data should be stored in Delta tables, with row tracking and change data feed enabled.

Poorva21
New Contributor

Row tracking gives each Delta row a stable internal ID, so Delta can track inserts/updates/deletes across table versions—even when files are rewritten or compacted.

Suppose we have a Delta table:

id value
1A
2B

When row tracking is enabled, Delta Lake stores an internal row ID (not visible to users):

id value _row_id (internal)
1A001
2B002

Think of row_id as a stable fingerprint for that row.

Row tracking ensures:

Correct incremental pipelines- Even after compacting files, you still get accurate row-level changes.

Accurate CDF outputs- Pre-image and post-image rows are correctly paired.

Safe MERGE, UPDATE, DELETE- Delta knows exactly which rows were modified.

Better performance- Delta avoids expensive file-level scans because it knows exactly which rows changed.