cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader and source file structure optimisation

ilarsen
Contributor

Hi.  I have a question, and I've not been able to find an answer.  I'm sure there is one...I just haven't found it through searching and browsing the docs.

 

How much does it matter (if it is indeed that simple) if source files read by auto loader are in a single folder or structured by subfolders (e.g. YYYY \ MM \ DD).

 

My environment is Azure Databricks and ADLS gen2 (using hierarchical namespace).  In this case, I have 4 "folders" which each contain all the files we've ever received from various post API methods (1 folder for each method).  It was not set up to create subfolders based on date.  So there's currently from <1 million to > 5 million, depending on the method.

 

I need to migrate this data, and where this is coming from is - is it worth the effort of copying to a date-based structure, because it will make the auto loader part more efficient, or just dump it over as-is and carry on with life..?

 

1 REPLY 1

Thanks for your response, that does help.  From what I found - or didn't find, rather - it didn't seem to me like it would be a huge performance impact, either.  A full-scale test would perhaps be the only way for me to learn for sure, but that may not be worth the effort.  The flat file structure is historical now, a new process lands these files in a subfolder structure.

 

That said, I am still interested if someone else comes across this and can shed any more light on the potential performance impacts of flat-vs-hierarchical source file folder structures with auto loader ingestion.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group