Databricks Community

Phani1 · ‎08-19-2024

Hi Team,

Can you share the best practices for designing the autoloader data processing?

We have data from 30 countries data coming in various files. Currently, we are thinking of using a root folder i.e country, and with subfolders for the individual countries.

In the autoloader script, we plan to set the path to the root folder. Is this a good method? Please advise on the best way to handle thousands of files.

Regards,

Phani

szymon_dybczak · ‎08-19-2024

Hi @Phani1 ,

Structure of folders that you are going to use make sense to me. Since you've mentioned that there will be thousands of files, the best practice will be to use autoloader with file notification mode.

Also, you can read about databricks recommendations:

https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-n...

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html