โ11-30-2021 11:45 PM
Hi,
How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.
Thank you
โ12-01-2021 02:07 AM
Not sure what exactly you want to do, but you can use a widget to pass in parameters. These widgets can be of several types, including multiple values.
What I do not understand is to run the notebook in parallel. You want to run the same notebook in multiple jobs concurrently?
โ12-01-2021 02:17 AM
Hi Werners,
I would like to pass a list of values to the databricks notebook i/p parameter(list type). For example ["Eu","JP","APAC"] and need to run my notebook transformations for each value of the list in parallel.
Note : Values of a list should be come from user
โ12-01-2021 02:22 AM
ok so passing in the values can be done with widgets.
But the notebook itself will run in parallel as it runs on spark.
So paralellism is already there.
If there is a reason you want to control the parallelism (which is the case I think), you will have to launch multiple instances of the notebook at once:
https://docs.databricks.com/notebooks/notebook-workflows.html
โ12-01-2021 02:35 AM
Notebook code is executed on driver to achieve parallelism you need just to create Spark dataframe with your list.
As @Werner Stinckensโ said you can run multiple notebooks together also so in that case you will not use list just to every notebook pass 1 parameter from your list:
from multiprocessing.pool import ThreadPool
my_params = ["Eu","JP","APAC"]
pool = ThreadPool(4) # match cpu cores here
pool.map(
lambda my_param: run_notebook("my_notebook", 3600, {"my_widget": my_param}),
my_params)
โ12-01-2021 03:44 AM
We implemented our code using threadpool
pool= ThreadPool(mp.cpu_count())
pool.map(fn_name,value_list)
My question is like how we can pass list type to notebook using widgets.. Currently we are taking string input and splitting it..
โ12-01-2021 03:51 AM
you can convert list to dataframe and register as table/view so it will be accessible from all notebooks
โ12-01-2021 03:57 AM
you could use a multiselect widget, here is another topic about that.
How to pass in the values in this widget? Can be done in several ways depending on what schedule tool you use.
I use Data Factory, where I define the values of what has to be sent to the notebook widget.
โ12-01-2021 04:02 AM
another another way ๐ (in databricks you can achieve everything many ways) is to encode list using json library:
import json
print type(json.dumps([1, 2, 3]))
#>> <type 'str'>
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now