cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can't pass dynamic parameters to non-notebook Python job (spark_python_task)

zed
New Contributor III

I need to access the date of a given job running as a non-notebook Python job (spark_python_task). I want to pass a value from the cli when running it and being available to access the value in the script

I tried the approaches in the attached image when running 

bundle run my_job --params run_date=20240101

 

 

 

 

 

 

 

 

 

5 REPLIES 5

Walter_C
Databricks Employee
Databricks Employee

Job parameters are automatically pushed down as key-value parameters to all tasks that accept key-value parameters, which include the following task types:

  • Notebook

  • Python wheel (only when configured with keyword arguments)

  • SQL query, legacy dashboard, or file

  • Run Job

zed
New Contributor III

Hi, thank you for your response. I have a few follow-up questions to clarify best practices when it comes to passing parameters with Python files:

  1. If I want to pass parameters, should I avoid using spark_python_task for Python scripts?
  2. In the context of using Databricks Asset Bundles, is it generally discouraged to submit Databricks jobs using Python files (vs. notebooks)?
  3. I was able to pass parameters with --python-params like --run_date 20240101 and then load them using argument parser. Is it accurate to say that spark_python_task does not support key-value parameter passing, and if so, what would you recommend if I want to maintain my project in Python files rather than notebooks while being able to pass parameters?

Thank you for your help with this!

zed
New Contributor III

I come with 1 more question. To clarify, my previous questions were focused on jobs that are triggered manually, without scheduling.

For scheduled jobsโ€”particularly those using Python script tasksโ€”how can I configure the `job.yml` resource and the Python script to dynamically retrieve `{{job.start_time.iso_date}}` at runtime?

Thanks again!

Walter_C
Databricks Employee
Databricks Employee

zed
New Contributor III

So, I think if I change the spark_python_task to a notebook_task but I keep the file as python file instead of notebook is ok. Now I can use the data bricks widgets easily and retrieve those parameters and I also put to version control python files instead of notebook

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group