cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to ensure that a Databricks Run Submit run invoked from Airflow only runs one time?

cgrant
Databricks Employee
Databricks Employee

I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?

1 ACCEPTED SOLUTION

Accepted Solutions

brickster_2018
Databricks Employee
Databricks Employee

Idempotency can be ensured by providing the idempotency token. It's easy to pass the same through REST API as mentioned in the below doc:

https://kb.databricks.com/jobs/jobs-idempotency.html

The primary reason for multiple runs is the client submits the request and waits for the response from the server(Job Service). However, due to one or more reasons, the client does not get a response within its defined timeout period. After that, the client retries. However the initial request if successfully submitted on the job service will trigger the job run. The retry request will also trigger a job run causing duplicate job runs. Usage of idempotency token will ensure that the duplicate job runs are not triggered.

View solution in original post

1 REPLY 1

brickster_2018
Databricks Employee
Databricks Employee

Idempotency can be ensured by providing the idempotency token. It's easy to pass the same through REST API as mentioned in the below doc:

https://kb.databricks.com/jobs/jobs-idempotency.html

The primary reason for multiple runs is the client submits the request and waits for the response from the server(Job Service). However, due to one or more reasons, the client does not get a response within its defined timeout period. After that, the client retries. However the initial request if successfully submitted on the job service will trigger the job run. The retry request will also trigger a job run causing duplicate job runs. Usage of idempotency token will ensure that the duplicate job runs are not triggered.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group