@oishimbo @jasbro
This can be tricky because, to achieve this functionality, you need to consider both VACUUM retention and time travel settings. You can use the following query to retrieve data based on your time travel needs, but ensure that the VACUUM retention policy is aligned with the time range you want to access. If the relevant data files are removed by VACUUM, those historical versions will no longer be accessible.
-- Example: Query data as of yesterday
SELECT COUNT(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1);
If you want to access data from the last 90 days, ensure that the VACUUM retention policy is set to at least 92 or 93 days to avoid losing the relevant data.
-- Example: Query data as of 90 days ago
SELECT COUNT(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 90);
However, instead of querying all historical data directly from Delta Lake in real-time, it is recommended to export periodic snapshots to a snapshot table. This approach ensures consistent performance, avoids potential issues caused by VACUUM deleting old data files, and reduces the load on your Delta tables.
Refer the link - https://www.databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes....