cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to ensure that a Databricks Run Submit run invoked from Airflow only runs one time?

cgrant
Databricks Employee
Databricks Employee

I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?

1 ACCEPTED SOLUTION

Accepted Solutions

brickster_2018
Databricks Employee
Databricks Employee

Idempotency can be ensured by providing the idempotency token. It's easy to pass the same through REST API as mentioned in the below doc:

https://kb.databricks.com/jobs/jobs-idempotency.html

The primary reason for multiple runs is the client submits the request and waits for the response from the server(Job Service). However, due to one or more reasons, the client does not get a response within its defined timeout period. After that, the client retries. However the initial request if successfully submitted on the job service will trigger the job run. The retry request will also trigger a job run causing duplicate job runs. Usage of idempotency token will ensure that the duplicate job runs are not triggered.

View solution in original post

1 REPLY 1

brickster_2018
Databricks Employee
Databricks Employee

Idempotency can be ensured by providing the idempotency token. It's easy to pass the same through REST API as mentioned in the below doc:

https://kb.databricks.com/jobs/jobs-idempotency.html

The primary reason for multiple runs is the client submits the request and waits for the response from the server(Job Service). However, due to one or more reasons, the client does not get a response within its defined timeout period. After that, the client retries. However the initial request if successfully submitted on the job service will trigger the job run. The retry request will also trigger a job run causing duplicate job runs. Usage of idempotency token will ensure that the duplicate job runs are not triggered.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now