Databricks inconsistent count and select
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 03:47 PM
Hi,
I have a table with 2 versions:
1. Add txn: path = "a.parquet" numRecords = 10 deletionVector = null
2. Add txn: path = "a.parquet" numRecords = 10 deletionVector = (..., cardinality = 2)
Please note both transactions point to the same physical path ("a.parquet"), without any remove transaction.
From my understanding of the delta protocol, since the above are 2 separate logical files residing in two different versions, the above describes a legal delta table.
When querying the table using databricks, I'm seeing inconsistent results:
1. Select * from table - returns 18 rows (as expected)
2. Select count(*) from table - returns count = 8
It seems like count(*) is ignoring the first add transaction.
Could you please settle this? is there indeed a bug with count(*) calculation?
Thanks,
Shani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 05:42 PM
Hello the behavior observed indeed seems to be inconsistent with the expected behavior in delta, do you have a support contract to open a support ticket so this can be further analyzed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 05:51 PM
Thanks, Walter.
I do not have a support contract. I observed this bug using Azure databricks, and wanted to bring it to your attention.

