Hi @_its_akshaye
Yes—capture it from the DLT event log and derive it directly from the target table’s CDF, then aggregate by time.
Options that work well
- Use the DLT event log “rows written” metrics
- Every pipeline writes a structured event log to your pipeline storage/UC TVF that includes per-table batch stats such as rows written/processed per update. You can query it and bucket by hour/day to get upsert/delete counts over time.
- Derive metrics from the Silver table’s Change Data Feed (CDF)
- CDF exposes row-level changes with _change_type values like insert, delete, update_preimage, update_postimage, which you can group by a timestamp column (for example, your sequencing column) to produce hourly/daily counts of inserts/updates/deletes that actually landed in Silver.
Example queries
Below are templates you can run as-is after replacing identifiers; they produce hourly metrics. Switch date_trunc('hour', ...) to 'day' for daily.
1) From the DLT event log
- If your pipeline is in Unity Catalog, use the TVF: SQL: SELECT date_trunc('hour', timestamp) AS hour_bucket, details:flow_progress.metrics.num_output_rows::bigint AS rows_written, details:flow_progress.output_dataset AS output_dataset FROM event_log('<pipeline_id>') WHERE event_type = 'flow_progress' AND details:flow_progress.metrics.num_output_rows IS NOT NULL QUALIFY output_dataset = '<your_silver_table_uc_name>' GROUP BY 1, 3 ORDER BY 1;
- If your pipeline uses HMS storage paths, read the Delta table at system/events: SQL: SELECT date_trunc('hour', timestamp) AS hour_bucket, details:flow_progress.metrics.num_output_rows::bigint AS rows_written, details:flow_progress.output_dataset AS output_dataset FROM delta.'dbfs:/pipelines/<pipeline-id>/system/events' WHERE event_type = 'flow_progress' AND details:flow_progress.metrics.num_output_rows IS NOT NULL AND details:flow_progress.output_dataset = '<your_silver_table_uc_name>' GROUP BY 1, 3 ORDER BY 1;
Notes:
- flow_progress carries structured metrics for each table update; num_output_rows aligns with what the UI shows and is easy to roll up by time buckets.
- If you need to separate updates vs deletes precisely, the event log provides totals written per output dataset, but not semantically split by operation type. For that breakdown, use CDF (next option).
2) From the Silver table’s CDF (semantic split: insert/update/delete)
SQL: SELECT date_trunc('hour', COALESCE(_commit_timestamp, <your_event_ts>)) AS hour_bucket, SUM(CASE WHEN _change_type = 'insert' THEN 1 ELSE 0 END) AS inserts, SUM(CASE WHEN _change_type = 'delete' THEN 1 ELSE 0 END) AS deletes, SUM(CASE WHEN _change_type IN ('update_postimage') THEN 1 ELSE 0 END) AS updates FROM table_changes('<catalog>.<schema>.<silver_table>', START => '2024-12-01') GROUP BY 1 ORDER BY 1;
Tips:
- _commit_timestamp exists on modern runtimes; otherwise use your sequencing column (e.g., operation_date) as the time reference.
- For updates, you typically count update_postimage rows; update_preimage rows represent the before-image and usually aren’t counted as “applied rows.”
- Retention: CDF is subject to the table’s retention policy and VACUUM; keep that in mind for long lookbacks.
When to use which
- Need exact operation split (insert/update/delete) at Silver: use CDF on the Silver table and aggregate by hour/day.
- Need a lightweight, “what was written per batch” view that mirrors the DLT UI: query event_log flow_progress and bucket by time (hour/day).