Hi,
maybe someone can help me i do want to run very narrow query
SELECT *
FROM my_table
WHERE snapshot_date IN('2023-01-06', '2023-01-07')
-- part of the physical plan:
-- Location: PreparedDeltaFileIndex [dbfs:/...]
-- PartitionFilters: [cast(snapshot_date#282634 as string) IN (2023-01-06,2033-01-07)]
it is very fast ~1s, in the execution plan i see it is using those provided datasets as arguments for partition filters.
If I try to provide filter predicate in form of the one column table it does full table scan and it takes 100x longer.
%sql
SELECT *
FROM
my_table
WHERE snapshot_date IN (
SELECT snapshot_date
FROM (VLUES('2023-01-06'), ('2023-01-07')) T(snapshot_date)
)
-- plan
Location: PreparedDeltaFileIndex [dbfs:/...]
ReadSchema: ...
Is there a way to provide hints for parition filters or something to make sure it does not read whole table?