autoloader data processing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2024 10:42 PM
Hi Team,
Can you share the best practices for designing the autoloader data processing?
We have data from 30 countries data coming in various files. Currently, we are thinking of using a root folder i.e country, and with subfolders for the individual countries.
In the autoloader script, we plan to set the path to the root folder. Is this a good method? Please advise on the best way to handle thousands of files.
Regards,
Phani
- Labels:
-
Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2024 11:04 PM
Hi @Phani1 ,
Structure of folders that you are going to use make sense to me. Since you've mentioned that there will be thousands of files, the best practice will be to use autoloader with file notification mode.
Also, you can read about databricks recommendations:
https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html

