Hi all
To read from a large Delta table, I'm using readStream but with a trigger(availableNow=True) as I only want to run it daily. This worked well for an intial load and then incremental loads after that.
At some point though, I received an error from the source Delta table that a parquet file referenced by the index is not available anymore.
I know that a VACUUM command is periodically issued against the source table but with the default of 7 days.
My incremental load was not executed for 2 weeks. Could that be a problem?
How does readStream work exactly: If it ran 2 weeks ago, will it try to read all table versions since then? That could explain the error as it would reference parquet files from > 7 days.
Thanks