cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

autoloader data processing

Phani1
Valued Contributor II

 

Hi Team,

Can you share the best practices for designing the autoloader data processing?

We have data from 30 countries data coming in various files. Currently, we are thinking of using a root folder i.e country, and with subfolders for the individual countries.

In the autoloader script, we plan to set the path to the root folder. Is this a good method? Please advise on the best way to handle thousands of files.

Regards,

Phani

1 REPLY 1

szymon_dybczak
Contributor

Hi @Phani1 ,

Structure of folders that you are going to use make sense to me. Since you've mentioned that there will be thousands of files, the best practice will be to use autoloader with file notification mode. 

 

Also, you can read about databricks recommendations: 

https://learn.microsoft.com/en-us/azure/databricks/ingestion/cloud-object-storage/auto-loader/file-n...

https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group