Hi,
I have a table with 2 versions:
1. Add txn: path = "a.parquet" numRecords = 10 deletionVector = null
2. Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)
Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.
From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table.
When querying the table using databricks, I'm seeing inconsistent results:
1. Select * from table - returns 18 rows (as expected)
2. Select count(*) from table - returns count = 8
It seems like count(*) is ignoring the first add transaction.
Could you please settle this? is there indeed a bug with count(*) calculation?
Thanks,
Shani