It is the practice exam for data engineer associate
The question is:
A data engineering team has created a series of tables using Parquet data stored in an external system. The team is noticing that after appending new rows to the data in the external system, their queries within Databricks are not returning the new rows. They identify the caching of the previous data as the cause of this issue. Which of the following approaches will ensure that the data returned by queries is always up-to-date?
The options are
A. The tables should be converted to the Delta format
B. The tables should be stored in a cloud-based external system
C. The tables should be refreshed in the writing cluster before the next query is run D. The tables should be altered to include metadata to not cache
E. The tables should be updated before the next query is run
The correct answer is set to A while I choose D.
My understanding is that external data source cannot guarantee ACID and the data is first fetched from Cache. So the options is either we disable the cache, or move data. But just convert table format will not help.
Can anyone help to explain why converting format will solve the problem?