We needed job_id and run_id in a custom metrics Delta table so we could join to `system.lakeflow.job_run_timeline`. Tried four approaches before finding the one that works on serverless compute.
What doesn't work
spark.conf.get("spark.databricks.job.id")
Throws CONFIG_NOT_AVAILABLE on serverless. This key exists in classic compute but not in the Spark Connect protocol.
os.environ["DATABRICKS_JOB_ID"]
Not a real env var. Databricks sets `DATABRICKS_RUNTIME_VERSION` and cluster lib paths, but nothing with job identity.
dbutils.notebook.entry_point.getDbutils().notebook().getContext()
Works on notebook tasks. Fails in Python wheel tasks with the module has no attribute 'notebook'.
spark_env_vars with {{job.id}}
Dynamic value references don't resolve in spark_env_vars. The value passes through as the literal string {{job.id}}.
What works
Job-level parameters with dynamic value references, piped into task named_parameters:
parameters:
- name: job_id
default: "{{job.id}}"
- name: run_id
default: "{{job.run_id}}"
tasks:
- python_wheel_task:
named_parameters:
job_id: "{{job.parameters.job_id}}"
run_id: "{{job.parameters.run_id}}"
Values arrive as sys.argv. Parse with argparse:
import argparse, sys
parser = argparse.ArgumentParser()
parser.add_argument("--job_id", type=int, default=None)
parser.add_argument("--run_id", type=int, default=None)
args, _ = parser.parse_known_args(sys.argv[1:])
Bonus: dbruntime.databricks_repl_context also works
from dbruntime.databricks_repl_context import get_context
ctx = get_context()
job_id = ctx.jobId
run_id = ctx.idInJob
Undocumented but functional in both script and wheel tasks on serverless. We went with `named_parameters` because it's the documented approach.
How I figured this out
Wrote a 30-line test script that dumps sys.argv, all env vars, spark conf, and dbutils context. Created a Databricks job with job parameters set to {{job.id}} and {{job.run_id}}. Ran it once. The output showed exactly which sources had real values and which were empty.
Sometimes the fastest path to the answer is the oldest trick: print everything, read the output.
Full blog post with the story behind these findings: link