โ02-01-2024 10:31 AM
Hi,
I am running autoloader with continuous trigger. How can I stop this trigger during some specific time, only if no data pending and current batch process is complete. How to check how many records pending in queue and current state.
Regards,
Sanjay
โ02-11-2024 11:22 PM
Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution:
Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.
To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.
โ02-02-2024 05:33 AM
You can switch to 'Triggered' pipeline in this case.
Next, create a job in workflow and attach a trigger of type 'file arrival' to it. Next add the notebook and cluster to the job. If you're not using DLT, then set cluster timeout minutes to 0, so that your cluster shuts down immediately after it's inactive.
Now, whenever a file will arrive in your landing location, this trigger will go off and will start the cluster which will then run the notebook until it finishes the task.
โ02-03-2024 06:27 AM
Thank you melbourne. I can not switch to triggered pipeline for now. Is it possible to stop/pause using workspace client or Job Rest API?
Thanks,
Sanjay
โ02-08-2024 10:37 PM
Hello, I am new here, Can I ask a question?
โ02-11-2024 11:22 PM
Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution:
Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.
To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group