2 weeks ago
I have a airflow DAG which calls databricks job that has a task level parameters defined as job_run_id (job.run_id) and has a type as python_script. When I try to access it using sys.argv and spark_python_task, it only prints the json that has passed through the airflow job. I want that sys should be able to get both the parameters passed by DAG and databricks job.
We have a use case where we don't want to use anything related to dbutils. Its a python script so we want it to be independent of dbutils.
Tuesday
Hey @divyab7
Sorry, now I understand better what you actually need. I got confused at first and thought you only wanted to access the parameters you pass through Airflow.
I think the dynamic identifiers that Databricks generates at runtime (like run IDs) are not injected there automatically.
I have been thinking in a way to get them without using dbutils is:
Job id โ you can extract it from spark.conf.get("spark.databricks.clusterUsageTags.clusterName"), which has a value like job-<job_id>-run-<task_run_id>.
Job run ID โ once you have the job_id, you can call the Databricks Jobs API and retrieve the job_run_id.
This approach should work, but I agree itโs not very straightforward. Databricks could definitely make it easier to expose these values directly in the runtime context instead of having to parse them or query the API.
Hope this helps, ๐ฅ
Isi
a week ago
Hey @divyab7
Hi! I ran into the same thing. The short version is: for spark_python_task, the script only receives the arguments you send in the run payload, and Databricks does not automatically merge โjob-levelโ parameters with the ones you pass at run time. What worked for me was to build the job dynamically from Airflow: I keep a small YAML (or dict) with the job defaults (cluster type, wheels, and also any default CLI args I want), and then, when the DAG runs, I merge those defaults with the DAGโs dynamic values (like data_interval_start / data_interval_end). The result is a single, flat list of CLI parameters that I send in the parameters field of the run request.
This way, inside the Python script I donโt rely on dbutils at all โ I just parse the CLI args and everything is there (both the job defaults and the DAG-specific values). The key point is that run-time parameters replace the jobโs parameters unless you merge them yourself before submitting the run. This approach keeps the job configurable (cluster/image/wheels can change via config), and at the same time injects all execution info into the script in a simple, dependency-free way.
Tell me if you need more details, ๐
Isi
a week ago
Thank you for your response. Can you please give me an example on how to implement this like should it be implemented in a certain way or do you have any code example?
a week ago
My use case is we need job.run_id and we will only get this when the job is triggered and the python script invoked by databricks job needs it in order to move forward. I am still confused even if we merge it then how its going to replace dynamic value reference in databricks. Can you please provide me small code example?
Tuesday
Hey @divyab7
Sorry, now I understand better what you actually need. I got confused at first and thought you only wanted to access the parameters you pass through Airflow.
I think the dynamic identifiers that Databricks generates at runtime (like run IDs) are not injected there automatically.
I have been thinking in a way to get them without using dbutils is:
Job id โ you can extract it from spark.conf.get("spark.databricks.clusterUsageTags.clusterName"), which has a value like job-<job_id>-run-<task_run_id>.
Job run ID โ once you have the job_id, you can call the Databricks Jobs API and retrieve the job_run_id.
This approach should work, but I agree itโs not very straightforward. Databricks could definitely make it easier to expose these values directly in the runtime context instead of having to parse them or query the API.
Hope this helps, ๐ฅ
Isi
Tuesday
This was really helpful. Thank you for the response ๐
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now