10-14-2021 06:36 PM
If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere?
10-15-2021 12:30 AM
AFAIK common spark does not have this num_affected_rows. I assume you execute delta lake actions.
You can fetch this from the json files stored in the _delta lake folder.
In those files there is a member called 'operationmetrics'.
https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log
Excellent video on how the delta lake transaction log works.
10-18-2021 12:30 PM
To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out
First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.
%python
from delta.tables import DeltaTable
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
firstDelta.alias("first").merge(
secondDF.alias("second"),
"first.id = second.id") \
.whenNotMatchedInsertAll() \
.execute()
Next we extract one of the operation metrics from this merge operation:
%scala
import io.delta.tables._
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
operationMetrics("numTargetRowsInserted")
This returns 3, since 1001 , 1002, and 1003 were added.
Similarly, you can do this with your Delta table after your updates to the target table.
10-21-2021 10:05 AM
Hi @John Smith,
Please make sure to select @Dan Zafar response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.
Thank you.
10-22-2021 11:23 AM
@Dan Zafar Thank you, i will try this.
10-19-2021 01:25 AM
Good one Dan! I never thought of using the delta api for this but there you go.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group