I'm new to spark and Databricks and I'm trying to write a pipeline to take CDC data from a postgres database stored in s3 and ingest it. The file names are numerically ascending unique ids based on datatime (ie20220630-215325970.csv). Right now autoloader seems to fetch all files at the source in random order. This means that updates to rows in DB may not happen in the correct order.
I have attached a screenshot with an example. Update, 1, 2, and 3 were entered sequentially after all other displayed records but they appear in the df in that order.
I've tried using latestFirst to see if I can get the files processed in a predictable order but that option doesn't seem to have any effect.
Is there a way to load and write files in order by filename using autoloader?
Thanks,
Ben