Hey @Volker ,
First of all, I’d recommend considering Auto Liquid Clustering, as it can simplify the process of defining clustering keys.
You can read more about it in the Databricks documentation (it’s currently in Public Preview, but you can probably start using it already)
Since the official docs are still limited, here’s a quick summary of the criteria used by the backend to trigger Liquid Clustering:
• The table size must be at least 256 MB.
• There must be at least 10 pruning-eligible scans with pruning predicates.
• The clustering key must not have been changed in the last 2 weeks.
• It usually takes 2 to 5 hours for the table to reflect the Liquid Clustering key after the conditions are met.
Answering your questions:
• Yes, deleted files should be removed with VACUUM after 7 days — this is the default behavior.
• Yes, Liquid Clustering can handle full timestamps like processing_dttm.
However, using a timestamp with minutes and seconds can lead to too many small clusters if the values are highly distinct and that level of granularity isn’t relevant for filtering. In such cases, this may reduce clustering efficiency rather than improve it.
Maybe if your queries don’t require high precision, I recommend using truncated versions of your timestamp when filtering
Hopee this helps 🙂
Isi