โ10-14-2021 06:36 PM
If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere?
โ10-15-2021 12:30 AM
AFAIK common spark does not have this num_affected_rows. I assume you execute delta lake actions.
You can fetch this from the json files stored in the _delta lake folder.
In those files there is a member called 'operationmetrics'.
https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log
Excellent video on how the delta lake transaction log works.
โ10-18-2021 12:30 PM
To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out
First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.
%python
from delta.tables import DeltaTable
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
firstDelta.alias("first").merge(
secondDF.alias("second"),
"first.id = second.id") \
.whenNotMatchedInsertAll() \
.execute()
Next we extract one of the operation metrics from this merge operation:
%scala
import io.delta.tables._
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
operationMetrics("numTargetRowsInserted")
This returns 3, since 1001 , 1002, and 1003 were added.
Similarly, you can do this with your Delta table after your updates to the target table.
โ10-21-2021 10:05 AM
Hi @John Smithโ,
Please make sure to select @Dan Zafarโ response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.
Thank you.
โ10-22-2021 11:23 AM
@Dan Zafarโ Thank you, i will try this.
โ10-19-2021 01:25 AM
Good one Dan! I never thought of using the delta api for this but there you go.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group