Thursday
Hi Everyone,
I'm working on scheduling a job and would like to pass parameters that I've defined in my notebook. Ideally, I'd like these parameters to be dynamic meaning that if I update their values in the notebook, the scheduled job should automatically use the latest values. Is there a way to achieve this or any known workaround?
These are my parameters in notebook.
I am following all this parameter in all my notebook.
Thanks for your help!
Thursday - last edited Thursday
Hi @Raj_DB ,
Yep, you just need to use task values. They let you pass arbitrary values between tasks in a Databricks job.
So, for instance in your notebook you can define values you want to pass to your next job/task in following way.
Then in databricks workflow you can just past them to downstream job/task in following way. So, in above screenshot I defined catalog value that now I can pass to the job parameter:
{{tasks.ReadConfig.values.catalog}}
Use task values to pass information between tasks | Databricks on AWS
Thursday
Why not putting them extra code in notebook to handle job input parameters and then assign notebook default values based on some custom rule ๐ As far as I know, no built-in feature to achieve your goal.
Thursday
Hi @Coffee77 , Thank you for the response. Can you suggest me how to proceed? as I am new to this environment.
Thursday - last edited Thursday
Hi @Raj_DB ,
Yep, you just need to use task values. They let you pass arbitrary values between tasks in a Databricks job.
So, for instance in your notebook you can define values you want to pass to your next job/task in following way.
Then in databricks workflow you can just past them to downstream job/task in following way. So, in above screenshot I defined catalog value that now I can pass to the job parameter:
{{tasks.ReadConfig.values.catalog}}
Use task values to pass information between tasks | Databricks on AWS
Thursday
Nice ๐
Thursday
Thank you @szymon_dybczak , I will definitely try. I hope it will work.
Thursday
No problem @Raj_DB , it should work. I'm using this approach to dynamically pass parameters on the current project I'm part of ๐
Saturday
Thank you so much @szymon_dybczak , it worked perfectly!. I'm also exploring the idea of maintaining a single notebook to pass parameters and reusing it across different jobs. Do you think that would be feasible with this approach, especially considering each notebook might require different parameters? I'd really appreciate any suggestions you might have.
Thursday
I see you're using dbutils.widgets. text and dropdownโperfect! You're already on the right track.
Your widgets are already dynamic! Just pass parameters in your job configuration:
In your notebook (slight refactor of your code):
# Define widgets with defaults
dbutils.widgets.text("Month_refresh", "3")
dbutils.widgets.dropdown("Save_environment", "preprod", ["preprod", "prod"])
dbutils.widgets.dropdown("Save_Layer", "silver", ["bronze", "silver", "gold"])
dbutils.widgets.text("Save_folder", "Test/SalesData")
# Use the widget values
month = dbutils.widgets. get("Month_refresh")
env = dbutils.widgets.get("Save_environment")
layer = dbutils.widgets. get("Save_Layer")
folder = dbutils.widgets.get("Save_folder")
In your job configuration:
These job parameters will override your notebook defaults!
# Auto-detect environment
is_prod = spark.catalog.currentCatalog() == "prod_catalog"
default_env = "prod" if is_prod else "preprod"
default_layer = "gold" if is_prod else "silver"
dbutils.widgets.dropdown("Save_environment", default_env, ["preprod", "prod"])
dbutils.widgets.dropdown("Save_Layer", default_layer, ["bronze", "silver", "gold"])
This way, your scheduled jobs automatically adapt to the environment they run in.
Is this what you were looking for, or did you need the parameters to update without touching the job configuration?
Thursday
That would work indeed ๐ However, solution provided by @szymon_dybczak is really clean ๐ฏ In your code if you have separated workspaces by environment, I would suggest to get current environment based on "current workspace" or "environment variables" injected in all "job" or "all-purpose" clusters where you can store your custom environment names. You can do this via DAB along with Databricks CLI scripts. Take into account that you can have multiple catalogs per environment as is my use case ๐
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now