- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2025 12:35 PM
Hey @divyab7
Hi! I ran into the same thing. The short version is: for spark_python_task, the script only receives the arguments you send in the run payload, and Databricks does not automatically merge “job-level” parameters with the ones you pass at run time. What worked for me was to build the job dynamically from Airflow: I keep a small YAML (or dict) with the job defaults (cluster type, wheels, and also any default CLI args I want), and then, when the DAG runs, I merge those defaults with the DAG’s dynamic values (like data_interval_start / data_interval_end). The result is a single, flat list of CLI parameters that I send in the parameters field of the run request.
This way, inside the Python script I don’t rely on dbutils at all — I just parse the CLI args and everything is there (both the job defaults and the DAG-specific values). The key point is that run-time parameters replace the job’s parameters unless you merge them yourself before submitting the run. This approach keeps the job configurable (cluster/image/wheels can change via config), and at the same time injects all execution info into the script in a simple, dependency-free way.
Tell me if you need more details, 🙂
Isi