Turn process intelligence into automated action: connect Celonis directly to Databricks.
Celonis shows you where your business processes break down — where they stall, deviate, and leak money. But seeing the problem isn't the same as fixing it.
Databricks is where your data pipelines and ML models run. By connecting Celonis Action Flows directly to the Databricks Jobs API, you turn process insights into automated data actions — no middleware, no webhook servers, no glue code. One HTTP call.
Celonis detects that a customer order has been in "warehouse processing" for longer than the 95th percentile. The Action Flow triggers a Databricks job that scores the order against a delivery delay prediction model, checks alternative fulfillment options in the lakehouse, and writes a recommended action back to the order management system.
Impact: Orders at risk of late delivery are flagged and rerouted before the customer notices — not after the SLA is breached.
Celonis flags a process execution that violates a segregation-of-duties policy — the same person created and approved a purchase order. The Action Flow triggers a Databricks job that logs the violation with full event context into an audit Delta table, runs a compliance scoring model across all recent transactions, and updates the compliance dashboard.
Impact: Violations are caught and documented in real time, with a full audit trail, not discovered during the quarterly review.
Celonis identifies that a process variant has shifted — a new pattern is emerging that the current prediction models don't account for. The Action Flow triggers a Databricks ML pipeline that extracts the latest process event data, retrains the model with the new variant, validates performance, and deploys the updated model to the serving endpoint.
Impact: ML models stay aligned with how the business actually operates today, not how it operated when the model was last trained.
No middleware. Celonis calls Databricks directly over HTTPS.
Before you start:
Click the trigger module and choose what starts the flow:
For this tutorial, choose Schedule and set it to every hour.
Fill in the HTTP module:
URL:
AWS: https://<your-workspace>.cloud.databricks.com/api/2.1/jobs/run-now
Azure: https://adb-<workspace-id>.<shard>.azuredatabricks.net/api/2.1/jobs/run-now
GCP: https://<workspace-id>.gcp.databricks.com/api/2.1/jobs/run-now
Method: POST
Headers:
|
Key |
Value |
|
Authorization |
Bearer dapi... (your PAT) |
|
Content-Type |
application/json |
Body type: Raw (JSON)
Body:
{
"job_id": 12345
}
Replace the workspace URL and job ID with your actual values.
Set the “Parse response” to Yes, so we can refer to the job id in later steps.
To send data from Celonis into the Databricks job — for example, a case ID or threshold value:
{
"job_id": 12345,
"notebook_params": {
"case_id": "{{celonis.case_id}}",
"event_type": "{{celonis.signal_name}}",
"threshold_days": "5"
}
}
In your Databricks notebook, read these with:
case_id = dbutils.widgets.get("case_id")
event_type = dbutils.widgets.get("event_type")
{
"run_id": 67890,
"number_in_job": 1
}
Add a Router after the HTTP module to split the flow into success and failure paths.
Configure the Success path:
Configure the Failure path:
Where to find "Status code" in the filter condition: click the value field and you will see the output variables from the previous HTTP module. Look for the Status code field — it is a number like 200, 400, 403, etc.
If the Action Flow needs to wait for the Databricks job to finish before continuing:
AWS: GET https://<workspace-id>.cloud.databricks.com/api/2.1/jobs/runs/get?run_id={{run_id}}
Azure: GET https://adb-<workspace-id>.<shard>.azuredatabricks.net/api/2.1/jobs/runs/get?run_id={{run_id}}
GCP: GET https://<workspace-id>.gcp.databricks.com/api/2.1/jobs/runs/get?run_id={{run_id}}
Where {{run_id}} = click the field, pick run_id from the first HTTP module's output. If you don't see a parsed body, the response might not be parsed as JSON. Check that the first HTTP module has Parse response set to Yes, and run the Flow once at least so Celonis can parse the output once before referring to it.
Check the response:
After the second HTTP module (the GET status check), add a Router with two paths:
Path 1 — Job finished:
Path 2 — Job still running:
Or
Use a Repeater module instead of a loop. Place it right after the first Router's success path:
Flow structure:
Trigger → HTTP (trigger job) → Router (200 or error) → Repeater (20 iterations) → Sleep 30s → HTTP (check status) → Router → "Job Done" → next step → "Still Running" → does nothing, repeater continues
Setup:
|
Practice |
Why |
|
Use a **service principal** instead of a personal PAT |
Service principals are not tied to one user and can be scoped precisely |
|
**Store the token** in Celonis connection/secret management |
Never hardcode tokens in the Action Flow body |
|
Set **PAT expiry to 90 days** and rotate |
Limits the blast radius if the token leaks |
|
Restrict PAT permissions |
Only grant access to the specific jobs needed |
|
IP allowlisting |
Restrict Databricks API access to Celonis egress IPs |
|
Problem |
Solution |
|
401 Unauthorized |
PAT is invalid or expired — generate a new one |
|
403 Forbidden |
PAT doesn't have permission for this job — check job permissions |
|
404 Not Found |
Wrong workspace URL or job ID — double-check both |
|
400 Bad Request |
Malformed JSON — validate syntax (missing comma, wrong quotes) |
|
Job runs but fails |
API call succeeded — check the job run logs in Databricks |
Once the basic connection works, you can:
Process intelligence plus data intelligence. That's the closed loop.
For questions or feedback, reach out to your Databricks Account team.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.