cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to get autoloader to load files in order

159312
New Contributor III

I'm new to spark and Databricks and I'm trying to write a pipeline to take CDC data from a postgres database stored in s3 and ingest it. The file names are numerically ascending unique ids based on datatime (ie20220630-215325970.csv). Right now autoloader seems to fetch all files at the source in random order. This means that updates to rows in DB may not happen in the correct order.

I have attached a screenshot with an example. Update, 1, 2, and 3 were entered sequentially after all other displayed records but they appear in the df in that order.

I've tried using latestFirst to see if I can get the files processed in a predictable order but that option doesn't seem to have any effect.

Is there a way to load and write files in order by filename using autoloader?

Thanks,

Ben

1 ACCEPTED SOLUTION

Accepted Solutions

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Ben Bogart​ For lexicographically generated files, Auto Loader can leverage the lexical file ordering and optimized listing APIs. For more info on lexical ordering please go through the below link: https://docs.databricks.com/ingestion/auto-loader/file-detection-modes.html#lexical-ordering-of-file...

Since spark is distributed system, apart from the above, any other ordering is not guaranteed.

View solution in original post

1 REPLY 1

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Ben Bogart​ For lexicographically generated files, Auto Loader can leverage the lexical file ordering and optimized listing APIs. For more info on lexical ordering please go through the below link: https://docs.databricks.com/ingestion/auto-loader/file-detection-modes.html#lexical-ordering-of-file...

Since spark is distributed system, apart from the above, any other ordering is not guaranteed.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.