Databricks Community

Tarzi-Simon

Turn process intelligence into automated action: connect Celonis directly to Databricks.

Why Connect Celonis Action Flows to Databricks?

Celonis shows you where your business processes break down — where they stall, deviate, and leak money. But seeing the problem isn't the same as fixing it.

Databricks is where your data pipelines and ML models run. By connecting Celonis Action Flows directly to the Databricks Jobs API, you turn process insights into automated data actions — no middleware, no webhook servers, no glue code. One HTTP call.

Use Cases

1. Late Delivery Prediction & Mitigation

Celonis detects that a customer order has been in "warehouse processing" for longer than the 95th percentile. The Action Flow triggers a Databricks job that scores the order against a delivery delay prediction model, checks alternative fulfillment options in the lakehouse, and writes a recommended action back to the order management system.

Impact: Orders at risk of late delivery are flagged and rerouted before the customer notices — not after the SLA is breached.

2. Automated Compliance Alerting

Celonis flags a process execution that violates a segregation-of-duties policy — the same person created and approved a purchase order. The Action Flow triggers a Databricks job that logs the violation with full event context into an audit Delta table, runs a compliance scoring model across all recent transactions, and updates the compliance dashboard.

Impact: Violations are caught and documented in real time, with a full audit trail, not discovered during the quarterly review.

3. Process-Aware Model Retraining

Celonis identifies that a process variant has shifted — a new pattern is emerging that the current prediction models don't account for. The Action Flow triggers a Databricks ML pipeline that extracts the latest process event data, retrains the model with the new variant, validates performance, and deploys the updated model to the serving endpoint.

Impact: ML models stay aligned with how the business actually operates today, not how it operated when the model was last trained.

Architecture

No middleware. Celonis calls Databricks directly over HTTPS.

Prerequisites

Before you start:

A Celonis account with access to Action Flows
A Databricks workspace with at least one job configured
A Databricks Personal Access Token (PAT) for the easier setup (for production OAuth M2M is also supported)

Create a Databricks PAT

In Databricks, click your username (top-right)
Click Settings > Developer > Access tokens
Click Manage > Generate new token
Name it celonis-action-flow
Set an expiry (90 days recommended — rotate regularly)
Click Generate and copy the token immediately

Get Your Job ID

In Databricks, go to Workflows
Click the job you want to trigger
The Job ID is in the URL: https://<workspace>/#job/<job-id>

Step-by-Step Tutorial

Step 1: Create an Action Flow

Log into Celonis
Navigate to Action Flows
Click Create Action Flow

Step 2: Configure the Trigger

Click the trigger module and choose what starts the flow:

Schedule — run on a timer (e.g., every hour)
Celonis Signal — triggered when a process condition is met (e.g., order stuck > 48 hours)
Manual — triggered by a user clicking a button

For this tutorial, choose Schedule and set it to every hour.

Step 3: Add an HTTP Module

Click + after the trigger
Search for HTTP and select Make a Request

Step 4: Configure the Databricks API Call

Fill in the HTTP module:

URL:
AWS: https://<your-workspace>.cloud.databricks.com/api/2.1/jobs/run-now
Azure: https://adb-<workspace-id>.<shard>.azuredatabricks.net/api/2.1/jobs/run-now
GCP: https://<workspace-id>.gcp.databricks.com/api/2.1/jobs/run-now

Method: POST

Headers:

Key	Value
Authorization	Bearer dapi... (your PAT)
Content-Type	application/json

Body type: Raw (JSON)
Body:

{
  "job_id": 12345
}

Replace the workspace URL and job ID with your actual values.
Set the “Parse response” to Yes, so we can refer to the job id in later steps.

Step 5: Pass Dynamic Parameters (Optional)

To send data from Celonis into the Databricks job — for example, a case ID or threshold value:

{
  "job_id": 12345,
  "notebook_params": {
    "case_id": "{{celonis.case_id}}",
    "event_type": "{{celonis.signal_name}}",
    "threshold_days": "5"
  }
}

In your Databricks notebook, read these with:

case_id = dbutils.widgets.get("case_id")
event_type = dbutils.widgets.get("event_type")

Step 6: Test the Request

Click Run once in the Action Flow editor
You should get a 200 OK response:

{
  "run_id": 67890,
  "number_in_job": 1
}

Step 7: Verify in Databricks

Go to Workflows in Databricks
Click your job
You should see a new run triggered by the API call

Step 8: Add Error Handling

Add a Router after the HTTP module to split the flow into success and failure paths.

Click + after the HTTP module
Add a Router module — this splits the flow into two branches

Configure the Success path:

Click the dotted line on the first path
Click Set up a filter
Label: Success
Condition: set it to Status code Equal to 200
After this filter, add your next module (e.g., the polling step from Step 9, or a log entry)

Configure the Failure path:

Click the dotted line on the second path
Click Set up a filter
Label: Failure
Condition: set it to Status code Not equal to 200
After this filter, add an alert module:

Email — send yourself a failure notification
Slack — post an alert to a channel
Or a Text Aggregator — log the error for debugging

Where to find "Status code" in the filter condition: click the value field and you will see the output variables from the previous HTTP module. Look for the Status code field — it is a number like 200, 400, 403, etc.

Step 9: Poll for Job Completion (Optional)

If the Action Flow needs to wait for the Databricks job to finish before continuing:

Add a Sleep module — wait 30 seconds
Add another HTTP module:

AWS: GET https://<workspace-id>.cloud.databricks.com/api/2.1/jobs/runs/get?run_id={{run_id}}
Azure: GET https://adb-<workspace-id>.<shard>.azuredatabricks.net/api/2.1/jobs/runs/get?run_id={{run_id}}
GCP: GET https://<workspace-id>.gcp.databricks.com/api/2.1/jobs/runs/get?run_id={{run_id}}

Where {{run_id}} = click the field, pick run_id from the first HTTP module's output. If you don't see a parsed body, the response might not be parsed as JSON. Check that the first HTTP module has Parse response set to Yes, and run the Flow once at least so Celonis can parse the output once before referring to it.

Check the response:

After the second HTTP module (the GET status check), add a Router with two paths:

Path 1 — Job finished:

Click the dotted line on the first path
Set up a filter
Label: Job Done
Condition: data.state.life_cycle_state Equal to TERMINATED
Set up notification

Path 2 — Job still running:

Click the dotted line on the second path
Set up a filter
Label: Still Running
Condition: data.state.life_cycle_state Not equal to TERMINATED
Set up notifications

Or

Use a Repeater module instead of a loop. Place it right after the first Router's success path:

Flow structure:

Trigger → HTTP (trigger job) → Router (200 or error) → Repeater (20 iterations) → Sleep 30s → HTTP (check status) → Router → "Job Done" → next step → "Still Running" → does nothing, repeater continues

Setup:

After the success path of the first Router, add a Repeater module
Set Repeats to 20 (that gives you 20 x 30 seconds = 10 minutes max wait)
After the Repeater, add a Sleep module — set to 30 seconds
After Sleep, add the HTTP module (GET status check)
After the HTTP module, add a Router with a filter on the first path:
5.1. Label: Job Done
5.2. Condition: data.state.life_cycle_state Equal to TERMINATED
On the "Job Done" path, continue with your next steps (or just end the flow)

Step 10: Activate

Click Activate to turn the Action Flow on
It now runs automatically based on your trigger

Security Best Practices

Practice	Why
Use a service principal instead of a personal PAT	Service principals are not tied to one user and can be scoped precisely
Store the token in Celonis connection/secret management	Never hardcode tokens in the Action Flow body
Set PAT expiry to 90 days and rotate	Limits the blast radius if the token leaks
Restrict PAT permissions	Only grant access to the specific jobs needed
IP allowlisting	Restrict Databricks API access to Celonis egress IPs

Troubleshooting

Problem	Solution
401 Unauthorized	PAT is invalid or expired — generate a new one
403 Forbidden	PAT doesn't have permission for this job — check job permissions
404 Not Found	Wrong workspace URL or job ID — double-check both
400 Bad Request	Malformed JSON — validate syntax (missing comma, wrong quotes)
Job runs but fails	API call succeeded — check the job run logs in Databricks

What's Next

Once the basic connection works, you can:

Chain jobs — trigger a sequence of Databricks jobs from a single Action Flow
Write results back — use the Celonis Data Push API to send Databricks output back into Celonis data models
Call ML endpoints — hit a Databricks Model Serving endpoint for real-time predictions instead of batch jobs
Build closed loops — detect issue in Celonis → analyze in Databricks → push fix to source system → verify in Celonis

Process intelligence plus data intelligence. That's the closed loop.

For questions or feedback, reach out to your Databricks Account team.