In today’s world, data is everything. It drives decisions, powers automation, and fuels innovation. But with all this power comes a challenge: ensuring that the data we rely on remains accurate, trustworthy, and, most importantly, healthy. As organizations expand and their data pipelines grow more complex, tracking how and when data changes can quickly become overwhelming.
Imagine you’re managing a large-scale data pipeline, and suddenly, something goes wrong. A downstream process fails, a report is generated with missing data, or performance drops unexpectedly. How do you pinpoint the issue when there’s so much data flowing through multiple layers of your pipeline? This is where Delta Table History comes in to save the day.
Delta Table History is a feature in Databricks that provides visibility into every change, update, and modification made to your data tables. It logs all operations, from simple inserts to complex merges, and stores crucial metadata like who made the change, when it was made, and what was modified. Think of it as a logbook of every action that has taken place on your Delta Table, allowing you to monitor, audit, and troubleshoot your data in ways you’ve never imagined.
But that’s not all. Beyond simply tracking data changes, Delta Table History unlocks the potential for advanced operational monitoring. By digging deeper into the metadata and transaction logs, you can uncover patterns, identify inefficiencies, and set up automated alerts to keep you informed in real-time.
Let’s explore how you can leverage the data table history to evaluate your data table health.
Now that you are familiar with the expected outcome, let’s get to know the structure of Delta Table History first. You can retrieve a lot of information including the operations, user, and timestamp for each transaction to a Delta table by running the history command. The operations are returned in reverse chronological order.
For Spark SQL syntax details, see DESCRIBE HISTORY.
See the Delta Lake API documentation for Scala/Java/Python syntax details.
A high level schema design of the Delta table history is described below. For more details on columns, especially operationMetrics, please refer to this documentation.
Column |
Type |
Description |
version |
long |
Table version generated by the operation. |
timestamp |
timestamp |
When this version was committed. |
userId |
string |
ID of the user that ran the operation. |
userName |
string |
Name of the user that ran the operation. |
operation |
string |
Name of the operation. |
operationParameters |
map |
Parameters of the operation (for example, predicates.) |
job |
struct |
Details of the job that ran the operation. |
notebook |
struct |
Details of the notebook from which the operation was run. |
clusterId |
string |
ID of the cluster on which the operation ran. |
readVersion |
long |
Version of the table that was read to perform the write operation. |
isolationLevel |
string |
Isolation level used for this operation. |
isBlindAppend |
boolean |
Whether this operation appended data. |
operationMetrics |
map |
Metrics of the operation (for example, number of rows and files modified.) |
userMetadata |
string |
User-defined commit metadata if it was specified |
With this information at your fingertips, you can uncover invaluable insights to monitor the health of your delta table.
Now that we’ve unlocked the history of changes, how do we transform this data into actionable insights? This is where we define some KPIs (Key Performance Indicators) to help create measurable metrics. These KPIs will help you track how well your Delta Table is performing and whether it’s operating within expected parameters.
But before tracking the metrics, it is important to understand the data retention and availability for the history table. Please go through the documentation here.
Now, here are some key KPIs that can be derived from details present in Delta Table History:
To monitor the number of rows affected by operations (e.g., INSERT, UPDATE, DELETE) on a Delta Table within a specific time period. This KPI helps track the table's activity level and identify whether there is a need to perform tuning. For example, OPTIMIZE operation to reduce the number of small files and improve query performance or understand the trends in number of updated/inserted/deleted records.
This can be done simply using the query below:
%sql
WITH history AS (
-- Get the Delta Table history using DESC HISTORY
DESCRIBE HISTORY <table>
),
delta_history AS (
-- Flatten the history results and extract the necessary columns
SELECT
operation,
operationMetrics["numOutputRows"] AS numOutputRows, -- Number of rows affected
operationMetrics["numFiles"] AS numFiles,
operationMetrics["bytesAdded"] AS bytesAdded, -- Number of files affected
operationMetrics["bytesRemoved"] AS bytesRemoved,
CAST(timestamp AS DATE) AS operation_date -- Extract the date from the timestamp
FROM
history
)
-- Calculate the total rows affected, files impacted, and split by operation and date
SELECT
operation_date, -- Group by date
operation, -- Group by operation type (INSERT, UPDATE, DELETE, etc.)
SUM(numOutputRows) AS total_rows_affected, -- Total rows affected by operations per day
SUM(numFiles) AS total_files_written, -- Total files affected by operations per day
SUM(bytesAdded) AS total_bytes_added, -- Total bytes added by operations per day
SUM(bytesRemoved) AS total_bytes_removed -- Total bytes removed by operations per day
FROM
delta_history
GROUP BY
operation_date, -- Group by operation date
operation -- Group by operation type (INSERT, UPDATE, DELETE)
ORDER BY
operation_date DESC, -- Sort by date (most recent first)
operation;
The Data Insertion/Update Speed KPI measures the time it takes to perform a data modification operation on a Delta Table, such as an INSERT, UPDATE, or MERGE. This metric helps assess the performance of data modification operations in your Delta Lake pipeline, providing insights into how efficiently data is being ingested or updated.
Apart from total execution time, you can visualize scan time and re-write time as well.
%sql
WITH history AS (
-- Get the Delta Table history using DESC HISTORY
DESCRIBE HISTORY <table>
),
delta_history AS (
-- Flatten the history results and extract the necessary columns
SELECT
operation,
operationMetrics["executionTimeMs"] AS execution_time_ms, -- Total execution time in milliseconds
operationMetrics["scanTimeMs"] AS scan_time_ms, -- Time taken for scanning data
operationMetrics["rewriteTimeMs"] AS rewrite_time_ms, -- Time taken for rewriting data
CAST(timestamp AS DATE) AS operation_date -- Extract the date from the timestamp
FROM
history
WHERE
operation IN ('WRITE', 'UPDATE', 'MERGE', 'WRITE') -- Filter for data modification operations only
)
-- Calculate the execution time breakdown (Scan Time, Rewrite Time, and Other Time)
SELECT
operation_date, -- Group by operation date
operation, -- Group by operation type (WRITE, UPDATE, MERGE)
AVG(scan_time_ms) AS avg_scan_time_ms, -- Average scan time per operation
AVG(rewrite_time_ms) AS avg_rewrite_time_ms, -- Average rewrite time per operation
AVG(execution_time_ms) AS avg_execution_time_ms, -- Average total execution time
MIN(scan_time_ms) AS min_scan_time_ms, -- Minimum scan_time_ms
MAX(scan_time_ms) AS max_scan_time_ms, -- Maximum scan_time_ms
MIN(rewrite_time_ms) AS min_rewrite_time_ms, -- Minimum rewrite_time_ms
MAX(rewrite_time_ms) AS max_rewrite_time_ms, -- Maximum rewrite_time_ms
MIN(execution_time_ms) AS min_execution_time_ms, -- Minimum execution time
MAX(execution_time_ms) AS max_execution_time_ms -- Maximum execution time
FROM
delta_history
GROUP BY
operation_date, -- Group by operation date
operation -- Group by operation type (WRITE, UPDATE, MERGE)
ORDER BY
operation_date DESC, -- Sort by date (most recent first)
operation; -- Sort by operation type
You can also make this more interesting by comparing the scan and write times per hour or distributed by user.
Tracking the most active users or jobs impacting table updates provides valuable insights into the behavior and health of your table, especially in cases where multiple processes and stakeholders are making changes. By monitoring who is making the most updates and understanding the volume of changes being made, you can proactively address issues, detect inefficiencies, and ensure smooth operation across all users and processes involved.
This can be calculated with the below query:
%sql
WITH history AS (
-- Get the Delta Table history using DESC HISTORY
DESCRIBE HISTORY <table>
),
delta_history AS (
-- Flatten the history results and extract the necessary columns
SELECT
userName, -- The user making the modification
job["jobid"] as jobid, -- The job executing the operation
operation, -- The type of operation (WRITE, UPDATE, MERGE, etc.)
operationMetrics.numOutputRows AS num_output_rows, -- Rows affected by the operation
CAST(timestamp AS DATE) AS operation_date -- Extract the date from the timestamp
FROM
history
WHERE
operation IN ('INSERT', 'UPDATE', 'MERGE', 'WRITE') -- Filter for data modification operations only
)
-- Top Users (Only if jobId is NULL)
SELECT
userName, -- Group by user
COUNT(*) AS num_operations, -- Count the number of operations performed by the user
SUM(num_output_rows) AS total_rows_affected -- Sum of rows affected by the user
FROM
delta_history
WHERE
jobid IS NULL -- Only consider users with NULL jobId
GROUP BY
userName
ORDER BY
num_operations DESC, -- Sort by number of operations (most active first)
total_rows_affected DESC -- Sort by the number of rows affected
LIMIT 10 -- Limit the results to top 10 users
;
-- Top Jobs (Only if jobId is NOT NULL)
SELECT
jobId, -- Group by job
COUNT(*) AS num_operations, -- Count the number of operations performed by the job
SUM(num_output_rows) AS total_rows_affected -- Sum of rows affected by the job
FROM
delta_history
WHERE
jobId IS NOT NULL -- Only consider jobs with non-null jobId
GROUP BY
jobId
ORDER BY
num_operations DESC, -- Sort by number of operations (most active first)
total_rows_affected DESC -- Sort by the number of rows affected
LIMIT 10 -- Limit the results to top 10 jobs
Other important KPIs that can be used include…..
This KPI measures the volume of data updates made by individual users or specific jobs within a Delta Table over a defined period. By tracking this KPI, you can gain insights into which users or jobs are driving the most data modifications.
The Transaction Throughput KPI measures the number of operations performed on a Delta Table within a specific time period. It helps assess the overall activity level of the table and the efficiency of data processing workflows.
This KPI measures the increase in table size or the number of files over time, providing insight into how quickly your Delta Table is growing. By tracking this metric, you can monitor both the volume of data being added and the rate at which new data files are created.
This KPI measures the effectiveness and efficiency of managing changes to the structure and historical states of your Delta Table. By tracking this KPI, you can monitor how frequently the table schema evolves—such as the addition of new columns or data type changes—and ensure that schema updates are handled smoothly.
These updates can be identified using the ‘operation’ column by checking for alter table commands like “ADD COLUMN”.
Note: This is only valid when tables are updated using ALTER commands. If schema changes are made using Schema evolution merges, then it cannot be tracked in the history table.
The Data Retention Compliance (File Cleanliness) KPI measures how effectively data is managed in accordance with retention policies, ensuring that obsolete or unnecessary files are appropriately dealt with.
These are some of the KPIs that can be derived, obviously you can customize and create new KPIs as per your needs.
Once you have all these KPIs, it’s time to turn them into real-time monitoring tools. In Databricks, you can set up dashboards that visually display these KPIs, allowing you to monitor the health of your Delta Tables at a glance. Imagine having a dashboard that shows transaction throughput, execution times, and change volumes all in one place. Some simple examples could include:
To create an AI/BI dashboard, please refer to the Databricks documentation. You can also enable Genie spaces to help answer any spontaneous questions based on added datasets.
Setting up alerts can help owners be on top of any unexpected events that might crop up. Suppose you're managing a Delta Table that is critical to your data pipeline. This table experiences frequent transactions, and its performance, data growth, and operational efficiency directly impact downstream processes. The table is essential for running business-critical reports and feeding other systems.
In such cases, to ensure the health of the Delta Table, it’s essential to set up automated alerts when these KPIs exceed predefined thresholds. For example, you may want to be alerted when:
These alerts allow you to act proactively, preventing data issues from escalating into critical problems.
To know more about setting up Alerts, please refer to the Databricks documentation
While custom KPIs and alerts derived from Delta Table History offer deep control and flexibility, Databricks also provides a built-in anomaly detection feature as part of its Lakehouse Monitoring suite. This feature, currently in beta, helps you monitor table-level data quality issues like freshness and completeness - without requiring any modifications to your data pipelines or additional ETL overhead.
🔎 What It Detects
Databricks' anomaly detection automatically scans tables in a Unity Catalog schema and flags:
To learn more about this offering, please refer to Databricks documentation.
By leveraging Delta Table History and tracking KPIs, you gain control over the operational health of your Delta Tables. With the right tools and insights, you can ensure that your data pipeline runs smoothly, efficiently, and without surprises. From tracking data changes to measuring performance and identifying issues early, Delta Table History offers a robust solution for any data-driven organization.
So, take charge of your data today and make your Delta Tables not just reliable but resilient. The road to operational excellence begins with understanding what’s happening behind the scenes, and Delta Table History is your perfect guide.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.