Capture num_affected_rows in notebooks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2021 06:36 PM
If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere?
- Labels:
-
ETL Process
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-15-2021 12:30 AM
AFAIK common spark does not have this num_affected_rows. I assume you execute delta lake actions.
You can fetch this from the json files stored in the _delta lake folder.
In those files there is a member called 'operationmetrics'.
https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log
Excellent video on how the delta lake transaction log works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 12:30 PM
To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out
First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.
%python
from delta.tables import DeltaTable
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
firstDelta.alias("first").merge(
secondDF.alias("second"),
"first.id = second.id") \
.whenNotMatchedInsertAll() \
.execute()Next we extract one of the operation metrics from this merge operation:
%scala
import io.delta.tables._
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
operationMetrics("numTargetRowsInserted")This returns 3, since 1001 , 1002, and 1003 were added.
Similarly, you can do this with your Delta table after your updates to the target table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2021 10:05 AM
Hi @John Smith,
Please make sure to select @Dan Zafar response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2021 11:23 AM
@Dan Zafar Thank you, i will try this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2021 01:25 AM
Good one Dan! I never thought of using the delta api for this but there you go.