Best approach for handling batch processess from cloud object storage.

alexandrexixe
New Contributor

I'm working on a Databricks implementation project where external Kafka processes write JSON files to S3. I need to ingest these files daily, or in some cases every four hours, but I don't need to perform stream processing.

I'm considering two approaches to bring these files into a Delta Lake using Unit Catalog enviroment:

1 - Using Autoloader in batch mode: I could use Autoloader in batch mode to bring these files directly into a Delta bronze layer.

2- Creating external tables: I could create external tables from these files and use them as a bronze layer.


Do these approaches make sense?
What are the advantages and disadvantages of each?
Is there any other better aproach?