I'm working on a Databricks implementation project where external Kafka processes write JSON files to S3. I need to ingest these files daily, or in some cases every four hours, but I don't need to perform stream processing.
I'm considering two approaches to bring these files into a Delta Lake using Unit Catalog enviroment:
1 - Using Autoloader in batch mode: I could use Autoloader in batch mode to bring these files directly into a Delta bronze layer.
2- Creating external tables: I could create external tables from these files and use them as a bronze layer.
Do these approaches make sense?
What are the advantages and disadvantages of each?
Is there any other better aproach?