cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

numSourceRows greater than expected

rbricks
New Contributor

Hey

I am doing an upsert of a source DataFrame into a target table. Before said upsert, I print out the source DataFrame's row count, which is a bit smaller than what `numSourceRows` says after the operation completes and I check the operationMetrics. Two things occurred to me as to why this is happening:

  • The matching condition is being matched more than once (it's not, I checked. And it wouldn't make sense that this affects the field, according to what the Doc says)
  • Some rows are being modified because they are written in the same page as the other truly modified rows (still doesn't make sense).

What situations might cause this?

2 REPLIES 2

-werners-
Esteemed Contributor III

That is indeed interesting, never looked into it.

I just searched on the delta github space and found some commits that show there is a bit more to it than just a count:

https://github.com/delta-io/delta/commit/d2804cb92a7e36863144c7be9c55df1c6f1c1a1e

https://github.com/delta-io/delta/commit/8624b92ddd8d47f98e91b88b19b6d4af2e09033b

jose_gonzalez
Moderator
Moderator

could you share your code snippet please? also share the expected output.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.