Databricks inconsistent count and select

shanisolomon
New Contributor II

Hi, 

I have a table with 2 versions:

1. Add txn: path = "a.parquet" numRecords = 10 deletionVector = null

2. Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)

Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.

From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table.

When querying the table using databricks, I'm seeing inconsistent results:

1. Select * from table - returns 18 rows (as expected)

2. Select count(*) from table - returns count = 8

It seems like count(*) is ignoring the first add transaction.

Could you please settle this? is there indeed a bug with count(*) calculation?

Thanks,

Shani

Walter_C
Databricks Employee
Databricks Employee

Hello the behavior observed indeed seems to be inconsistent with the expected behavior in delta, do you have a support contract to open a support ticket so this can be further analyzed?

Thanks, Walter.

I do not have a support contract. I observed this bug using Azure databricks, and wanted to bring it to your attention.