cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Capture num_affected_rows in notebooks

BigJay
New Contributor II

If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere?

5 REPLIES 5

-werners-
Esteemed Contributor III

AFAIK common spark does not have this num_affected_rows. I assume you execute delta lake actions.

You can fetch this from the json files stored in the _delta lake folder.

In those files there is a member called 'operationmetrics'.

https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log

Excellent video on how the delta lake transaction log works.

Dan_Z
Honored Contributor
Honored Contributor

To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out

First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.

%python
from delta.tables import DeltaTable
 
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
 
firstDelta.alias("first").merge(
    secondDF.alias("second"),
    "first.id = second.id") \
  .whenNotMatchedInsertAll() \
  .execute()

Next we extract one of the operation metrics from this merge operation:

%scala
import io.delta.tables._
 
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
 
operationMetrics("numTargetRowsInserted")

This returns 3, since 1001 , 1002, and 1003 were added.

Similarly, you can do this with your Delta table after your updates to the target table.

Hi @John Smith​,

Please make sure to select @Dan Zafar​  response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.

Thank you.

BigJay
New Contributor II

@Dan Zafar​  Thank you, i will try this.

-werners-
Esteemed Contributor III

Good one Dan! I never thought of using the delta api for this but there you go.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.