Hi, have Databricks running on AWS, I'm looking for a way to know when is a good time to run optimize on partitioned tables. Taking into account that it's an expensive process, especially on big tables, how could I know if it's a good time to run it and on which tables? Maybe I can create a script that executes once a week and run optimize command only when is needed.
Also, it's a recommended practice to run OPTIMIZE for each partition on the same job that I create the partition?
So I create 2022-02-12 partition then run optimize where partition=2022-02-12, I'm not sure about this because of the "eventual consistency" of S3, maybe the files are not ready as soon as they are loaded and something can become corrupted.
Thanks!