Databricks Community

divyab7 · ‎08-21-2025

I have a airflow DAG which calls databricks job that has a task level parameters defined as job_run_id (job.run_id) and has a type as python_script. When I try to access it using sys.argv and spark_python_task, it only prints the json that has passed through the airflow job. I want that sys should be able to get both the parameters passed by DAG and databricks job.

We have a use case where we don't want to use anything related to dbutils. Its a python script so we want it to be independent of dbutils.

Isi · ‎08-26-2025

Hey @divyab7

Sorry, now I understand better what you actually need. I got confused at first and thought you only wanted to access the parameters you pass through Airflow.

I think the dynamic identifiers that Databricks generates at runtime (like run IDs) are not injected there automatically.

I have been thinking in a way to get them without using dbutils is:

Job id → you can extract it from spark.conf.get("spark.databricks.clusterUsageTags.clusterName"), which has a value like job-<job_id>-run-<task_run_id>.
Job run ID → once you have the job_id, you can call the Databricks Jobs API and retrieve the job_run_id.

This approach should work, but I agree it’s not very straightforward. Databricks could definitely make it easier to expose these values directly in the runtime context instead of having to parse them or query the API.

Hope this helps, 😥
Isi

View solution in original post

Isi · ‎08-24-2025

Hey @divyab7

Hi! I ran into the same thing. The short version is: for spark_python_task, the script only receives the arguments you send in the run payload, and Databricks does not automatically merge “job-level” parameters with the ones you pass at run time. What worked for me was to build the job dynamically from Airflow: I keep a small YAML (or dict) with the job defaults (cluster type, wheels, and also any default CLI args I want), and then, when the DAG runs, I merge those defaults with the DAG’s dynamic values (like data_interval_start / data_interval_end). The result is a single, flat list of CLI parameters that I send in the parameters field of the run request.

This way, inside the Python script I don’t rely on dbutils at all — I just parse the CLI args and everything is there (both the job defaults and the DAG-specific values). The key point is that run-time parameters replace the job’s parameters unless you merge them yourself before submitting the run. This approach keeps the job configurable (cluster/image/wheels can change via config), and at the same time injects all execution info into the script in a simple, dependency-free way.

Tell me if you need more details, 🙂

Isi

divyab7 · ‎08-24-2025

Thank you for your response. Can you please give me an example on how to implement this like should it be implemented in a certain way or do you have any code example?

divyab7 · ‎08-25-2025

My use case is we need job.run_id and we will only get this when the job is triggered and the python script invoked by databricks job needs it in order to move forward. I am still confused even if we merge it then how its going to replace dynamic value reference in databricks. Can you please provide me small code example?

Isi · ‎08-26-2025

Hey @divyab7

Sorry, now I understand better what you actually need. I got confused at first and thought you only wanted to access the parameters you pass through Airflow.

I think the dynamic identifiers that Databricks generates at runtime (like run IDs) are not injected there automatically.

I have been thinking in a way to get them without using dbutils is:

Job id → you can extract it from spark.conf.get("spark.databricks.clusterUsageTags.clusterName"), which has a value like job-<job_id>-run-<task_run_id>.
Job run ID → once you have the job_id, you can call the Databricks Jobs API and retrieve the job_run_id.

This approach should work, but I agree it’s not very straightforward. Databricks could definitely make it easier to expose these values directly in the runtime context instead of having to parse them or query the API.

Hope this helps, 😥
Isi