cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

numSourceRows greater than expected

rbricks
New Contributor

Hey

I am doing an upsert of a source DataFrame into a target table. Before said upsert, I print out the source DataFrame's row count, which is a bit smaller than what `numSourceRows` says after the operation completes and I check the operationMetrics. Two things occurred to me as to why this is happening:

  • The matching condition is being matched more than once (it's not, I checked. And it wouldn't make sense that this affects the field, according to what the Doc says)
  • Some rows are being modified because they are written in the same page as the other truly modified rows (still doesn't make sense).

What situations might cause this?

2 REPLIES 2

-werners-
Esteemed Contributor III

That is indeed interesting, never looked into it.

I just searched on the delta github space and found some commits that show there is a bit more to it than just a count:

https://github.com/delta-io/delta/commit/d2804cb92a7e36863144c7be9c55df1c6f1c1a1e

https://github.com/delta-io/delta/commit/8624b92ddd8d47f98e91b88b19b6d4af2e09033b

jose_gonzalez
Databricks Employee
Databricks Employee

could you share your code snippet please? also share the expected output.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group