While we evaluate moving our many autoloader configurations to use `cloudFiles.cleanSource` , we're wondering if we can instead just implement a lifecycle policy outside of Databricks that deletes files older than 30 days.
Is there a problem with doing this? (For example, if the azure storage account lifecycle policy deletes files older than 30 days while autoloader is running, is that a problem?) (Our autoloader configuration is in directory full listing mode and we have thousands of files a day coming in and only just realized how long autoloader is spending reading over files it has already processed, not to mention the storage costs we're paying).
We're trying to migrate to use the cleanSource option, but in the meantime, it is much faster for us to implement a lifecycle policy across all of our storage accounts. We're wondering if that is a viable solution while we work on migrating to the built-in cleanSource capability.
Thank you~
Nathan