4 hours ago
Hey !
I would like to migrate one ADF batch ingestion which has a TumblingWindowTrigger on top of the pipeline which pretty much check each 15 min if a file is landing, normally the files land in daily basis so will process it accordingly once in a day, and this self-dependency allow us to guarantee that the File + 1 which is arriving will be processed if the previous one was ingested correctly.
I see in Databricks workflow there are 3 kind of triggers: Schedule, Files Arrival and Continuous - what should be the homologue to the TumblingWindowTrigger and how to set Self Dependency in order to maintain the same approach.
4 hours ago
Hi @fjrodriguez ,
What about using databricks autoloader and triggering workflow every 15 min? Autoloader automatically detects what new files has arrived since last trigger of a job and will load only new files to target table. You can use available now trigger option which consumes all available records as an incremental batch.
So, let's say you prepare a notebook that will use autoloader. Now you will schedule this notebook using databricks workflows with option Max concurrent runs = 1. This will ensure that your job will run every 15 minutes, it will consume all new files that appeared within that period and if processing takes longer than 15 minutes it will wait for a previois job to finish,
4 hours ago
Hi @fjrodriguez ,
What about using databricks autoloader and triggering workflow every 15 min? Autoloader automatically detects what new files has arrived since last trigger of a job and will load only new files to target table. You can use available now trigger option which consumes all available records as an incremental batch.
So, let's say you prepare a notebook that will use autoloader. Now you will schedule this notebook using databricks workflows with option Max concurrent runs = 1. This will ensure that your job will run every 15 minutes, it will consume all new files that appeared within that period and if processing takes longer than 15 minutes it will wait for a previois job to finish,
3 hours ago
@szymon_dybczak - so lets assume tomorrow morning Files ingestion fail - What will happen with next one ? I want the next ingestion should not happen and retain it till fixing the stuck one.
2 hours ago
With ADF is straightforward with re-triggering the one that failed and then will automatically ingest the files with are queued after fix the failed ingestion.
2 hours ago
That's the beauty of autoloader. It stores succesfully processed files in checkpoint location. But if processing fails for whatever reason, autoloader will try to reingest all the files that weren't succefully loaded in previous run + all the new files that appeared.
3 hours ago
Hi @szymon_dybczak ,
sounds reasonable, will propone this approach. Thanks ๐
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now