Can't pass dynamic parameters to non-notebook Python job (spark_python_task)

zed
Databricks Partner

I need to access the date of a given job running as a non-notebook Python job (spark_python_task). I want to pass a value from the cli when running it and being available to access the value in the script

I tried the approaches in the attached image when running 

bundle run my_job --params run_date=20240101

 

 

 

 

 

 

 

 

 

Walter_C
Databricks Employee
Databricks Employee

Job parameters are automatically pushed down as key-value parameters to all tasks that accept key-value parameters, which include the following task types:

  • Notebook

  • Python wheel (only when configured with keyword arguments)

  • SQL query, legacy dashboard, or file

  • Run Job

zed
Databricks Partner

Hi, thank you for your response. I have a few follow-up questions to clarify best practices when it comes to passing parameters with Python files:

  1. If I want to pass parameters, should I avoid using spark_python_task for Python scripts?
  2. In the context of using Databricks Asset Bundles, is it generally discouraged to submit Databricks jobs using Python files (vs. notebooks)?
  3. I was able to pass parameters with --python-params like --run_date 20240101 and then load them using argument parser. Is it accurate to say that spark_python_task does not support key-value parameter passing, and if so, what would you recommend if I want to maintain my project in Python files rather than notebooks while being able to pass parameters?

Thank you for your help with this!

zed
Databricks Partner

I come with 1 more question. To clarify, my previous questions were focused on jobs that are triggered manually, without scheduling.

For scheduled jobs—particularly those using Python script tasks—how can I configure the `job.yml` resource and the Python script to dynamically retrieve `{{job.start_time.iso_date}}` at runtime?

Thanks again!

Walter_C
Databricks Employee
Databricks Employee

zed
Databricks Partner

So, I think if I change the spark_python_task to a notebook_task but I keep the file as python file instead of notebook is ok. Now I can use the data bricks widgets easily and retrieve those parameters and I also put to version control python files instead of notebook