cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

stop autoloader with continuous trigger programatically

sanjay
Valued Contributor II

Hi,

I am running autoloader with continuous trigger. How can I stop this trigger during some specific time, only if no data pending and current batch process is complete. How to check how many records pending in queue and current state.

Regards,

Sanjay

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution: 

 

Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.

 

To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.

 

 


 


 

View solution in original post

4 REPLIES 4

melbourne
Contributor

You can switch to 'Triggered' pipeline in this case.

Next, create a job in workflow and attach a trigger of type 'file arrival' to it. Next add the notebook and cluster to the job. If you're not using DLT, then set cluster timeout minutes to 0, so that your cluster shuts down immediately after it's inactive.

Now, whenever a file will arrive in your landing location, this trigger will go off and will start the cluster which will then run the notebook until it finishes the task. 

sanjay
Valued Contributor II

Thank you melbourne. I can not switch to triggered pipeline for now. Is it possible to stop/pause using workspace client or Job Rest API?

Thanks,

Sanjay

RamonaMraz
New Contributor II

Hello, I am new here, Can I ask a question?

Kaniz_Fatma
Community Manager
Community Manager

Hi @sanjay, Looking to effectively manage your autoloader's continuous trigger? Follow these steps for seamless execution: 

 

Pausing the Trigger at Specific Times: If you need to halt the continuous trigger during certain hours, consider switching to a triggered pipeline. However, if you prefer to stick with the continuous trigger, programmatically stop it through the workspace client or the Job Rest API1. Just be sure to handle this approach carefully to avoid any data loss.

 

To keep track of the number of records still waiting in the pipeline, your best bet is to keep an eye on the associated Structured Streaming job for your autoloader. While there isn't a direct method to retrieve the exact count, you can gain insight by monitoring the job's progress and checkpoint files. It's worth noting that if the job gets halted during a micro-batch, any changes to the trigger interval won't take effect until that batch is complete. This can therefore serve as a useful gauge for determining the presence of pending records. Just bear in mind that Databricks Auto Loader operates using Structured Streaming, so familiarizing yourself with how triggers function will ultimately allow for better control over costs and optimal data ingestion.

 

 


 


 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group