Hello, I have a daily ETL job that adds new records to a table for the previous day. However, from time to time, it does not produce any output.
After investigating, I discovered that one table is sometimes loaded as empty during execution. As a result, no new records are ingested.
It appears that this dataset is being read from the cache—I found evidence of this in SparkUI. This is interesting because we are not explicitly using .persist or .cache on any dataset, so it is likely done automatically.
To me, it seems that Spark attempts to load these records from Parquet but instead retrieves them from the cache, which returns an empty dataset.

1) Is there a chance that an automatic caching mechanism is interfering with my dataset?
2) Is there a chance that the cache contains version X of the Delta table, while storage already has version X+1?
3) Why is Spark reading an empty dataset from the cache? I thought that when a given DataFrame does not exist in the cache, it should be reloaded. I never expected an empty relation from `InMemoryTableScan`.