cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

stop autoloader with continuous trigger programatically

sanjay
Valued Contributor II

Hi,

I am running autoloader with continuous trigger. How can I stop this trigger during some specific time, only if no data pending and current batch process is complete. How to check how many records pending in queue and current state.

Regards,

Sanjay

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution: 

 

Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.

 

To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.

 

 


 


 

View solution in original post

4 REPLIES 4

melbourne
New Contributor III

You can switch to 'Triggered' pipeline in this case.

Next, create a job in workflow and attach a trigger of type 'file arrival' to it. Next add the notebook and cluster to the job. If you're not using DLT, then set cluster timeout minutes to 0, so that your cluster shuts down immediately after it's inactive.

Now, whenever a file will arrive in your landing location, this trigger will go off and will start the cluster which will then run the notebook until it finishes the task. 

sanjay
Valued Contributor II

Thank you melbourne. I can not switch to triggered pipeline for now. Is it possible to stop/pause using workspace client or Job Rest API?

Thanks,

Sanjay

RamonaMraz
New Contributor II

Hello, I am new here, Can I ask a question?

Please reply to me soon. Now, I will search egit essay writing service online on Google search because I got a lot of assignments from my mentor and I am not so good at writing them. Can anyone help me please?

Kaniz
Community Manager
Community Manager

Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution: 

 

Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.

 

To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.

 

 


 


 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.