cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Capture num_affected_rows in notebooks

BigJay
New Contributor II

If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere?

5 REPLIES 5

-werners-
Esteemed Contributor III

AFAIK common spark does not have this num_affected_rows. I assume you execute delta lake actions.

You can fetch this from the json files stored in the _delta lake folder.

In those files there is a member called 'operationmetrics'.

https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log

Excellent video on how the delta lake transaction log works.

Dan_Z
Databricks Employee
Databricks Employee

To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out

First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.

%python
from delta.tables import DeltaTable
 
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
 
firstDelta.alias("first").merge(
    secondDF.alias("second"),
    "first.id = second.id") \
  .whenNotMatchedInsertAll() \
  .execute()

Next we extract one of the operation metrics from this merge operation:

%scala
import io.delta.tables._
 
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
 
operationMetrics("numTargetRowsInserted")

This returns 3, since 1001 , 1002, and 1003 were added.

Similarly, you can do this with your Delta table after your updates to the target table.

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @John Smith​,

Please make sure to select @Dan Zafar​  response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.

Thank you.

BigJay
New Contributor II

@Dan Zafar​  Thank you, i will try this.

-werners-
Esteemed Contributor III

Good one Dan! I never thought of using the delta api for this but there you go.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group