Partition filter is skipped when table is used in where condition, why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-12-2023 01:24 AM
Hi,
maybe someone can help me i do want to run very narrow query
SELECT *
FROM my_table
WHERE snapshot_date IN('2023-01-06', '2023-01-07')
-- part of the physical plan:
-- Location: PreparedDeltaFileIndex [dbfs:/...]
-- PartitionFilters: [cast(snapshot_date#282634 as string) IN (2023-01-06,2033-01-07)]
it is very fast ~1s, in the execution plan i see it is using those provided datasets as arguments for partition filters.
If I try to provide filter predicate in form of the one column table it does full table scan and it takes 100x longer.
%sql
SELECT *
FROM
my_table
WHERE snapshot_date IN (
SELECT snapshot_date
FROM (VLUES('2023-01-06'), ('2023-01-07')) T(snapshot_date)
)
-- plan
Location: PreparedDeltaFileIndex [dbfs:/...]
ReadSchema: ...
Is there a way to provide hints for parition filters or something to make sure it does not read whole table?
- Labels:
-
Partition
-
Partition Filter
-
Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-12-2023 02:19 AM
No hints on partition pruning afaik.
The reason the partitions were not pruned is because the second query generates a completely different plan.
To be able to filter the partitions, a join first has to happen. And in this case it means the table has to be scanned.

