<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Capture num_affected_rows in notebooks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13295#M7996</link>
    <description>&lt;P&gt;To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%python
from delta.tables import DeltaTable
&amp;nbsp;
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
&amp;nbsp;
firstDelta.alias("first").merge(
    secondDF.alias("second"),
    "first.id = second.id") \
  .whenNotMatchedInsertAll() \
  .execute()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Next we extract one of the operation metrics from this merge operation:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%scala
import io.delta.tables._
&amp;nbsp;
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
&amp;nbsp;
operationMetrics("numTargetRowsInserted")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This returns 3, since 1001 , 1002, and 1003 were added.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Similarly, you can do this with your Delta table after your updates to the target table.&lt;/P&gt;</description>
    <pubDate>Mon, 18 Oct 2021 19:30:37 GMT</pubDate>
    <dc:creator>Dan_Z</dc:creator>
    <dc:date>2021-10-18T19:30:37Z</dc:date>
    <item>
      <title>Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13293#M7994</link>
      <description>&lt;P&gt;If I run some code, say for an ETL process to migrate data from bronze to silver storage, when a cell executes it reports num_affected_rows in a table format. I want to capture that and log it in my logger. Is it stored in a variable or syslogged somewhere? &lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 01:36:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13293#M7994</guid>
      <dc:creator>BigJay</dc:creator>
      <dc:date>2021-10-15T01:36:12Z</dc:date>
    </item>
    <item>
      <title>Re: Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13294#M7995</link>
      <description>&lt;P&gt;AFAIK common spark does not have this num_affected_rows.  I assume you execute delta lake actions.&lt;/P&gt;&lt;P&gt;You can fetch this from the json  files stored in the _delta lake folder.&lt;/P&gt;&lt;P&gt;In those files there is a member called 'operationmetrics'.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log" target="test_blank"&gt;https://databricks.com/discover/diving-into-delta-lake-talks/unpacking-transaction-log&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Excellent video on how the delta lake transaction log works.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 07:30:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13294#M7995</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-15T07:30:18Z</dc:date>
    </item>
    <item>
      <title>Re: Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13295#M7996</link>
      <description>&lt;P&gt;To expand on werners's answer, you can use the Delta API to get this information. I suggest you use scala to access it. Here is some example code that would pull out &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First we make a trial merge to test with. Here firstDelta is just 1000 rows, with values 1 to 1000.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%python
from delta.tables import DeltaTable
&amp;nbsp;
firstDelta = DeltaTable.forName(spark, "firstDF")
secondDF = spark.range(998, 1004)
&amp;nbsp;
firstDelta.alias("first").merge(
    secondDF.alias("second"),
    "first.id = second.id") \
  .whenNotMatchedInsertAll() \
  .execute()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Next we extract one of the operation metrics from this merge operation:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%scala
import io.delta.tables._
&amp;nbsp;
val firstDF = DeltaTable.forName("firstDF")
val operationMetrics = firstDF.history(1).select("operationMetrics").collect()(0)(0).asInstanceOf[Map[String,String]]
&amp;nbsp;
operationMetrics("numTargetRowsInserted")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This returns 3, since 1001 , 1002, and 1003 were added.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Similarly, you can do this with your Delta table after your updates to the target table.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Oct 2021 19:30:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13295#M7996</guid>
      <dc:creator>Dan_Z</dc:creator>
      <dc:date>2021-10-18T19:30:37Z</dc:date>
    </item>
    <item>
      <title>Re: Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13296#M7997</link>
      <description>&lt;P&gt;Good one Dan!  I never thought of using the delta api for this but there you go.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Oct 2021 08:25:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13296#M7997</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-19T08:25:56Z</dc:date>
    </item>
    <item>
      <title>Re: Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13297#M7998</link>
      <description>&lt;P&gt;Hi @John Smith​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please make sure to select @Dan Zafar​&amp;nbsp; response as best answer if this post solved your question. It will move the post to the top and it will help to solve future questions from other customer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Oct 2021 17:05:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13297#M7998</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-21T17:05:52Z</dc:date>
    </item>
    <item>
      <title>Re: Capture num_affected_rows in notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13298#M7999</link>
      <description>&lt;P&gt;@Dan Zafar​&amp;nbsp; Thank you, i will try this.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Oct 2021 18:23:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/capture-num-affected-rows-in-notebooks/m-p/13298#M7999</guid>
      <dc:creator>BigJay</dc:creator>
      <dc:date>2021-10-22T18:23:33Z</dc:date>
    </item>
  </channel>
</rss>

