โ11-30-2021 11:45 PM
Hi,
How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.
Thank you
โ12-01-2021 02:07 AM
Not sure what exactly you want to do, but you can use a widget to pass in parameters. These widgets can be of several types, including multiple values.
What I do not understand is to run the notebook in parallel. You want to run the same notebook in multiple jobs concurrently?
โ12-01-2021 02:17 AM
Hi Werners,
I would like to pass a list of values to the databricks notebook i/p parameter(list type). For example ["Eu","JP","APAC"] and need to run my notebook transformations for each value of the list in parallel.
Note : Values of a list should be come from user
โ12-01-2021 02:22 AM
ok so passing in the values can be done with widgets.
But the notebook itself will run in parallel as it runs on spark.
So paralellism is already there.
If there is a reason you want to control the parallelism (which is the case I think), you will have to launch multiple instances of the notebook at once:
https://docs.databricks.com/notebooks/notebook-workflows.html
โ12-01-2021 02:35 AM
Notebook code is executed on driver to achieve parallelism you need just to create Spark dataframe with your list.
As @Werner Stinckensโ said you can run multiple notebooks together also so in that case you will not use list just to every notebook pass 1 parameter from your list:
from multiprocessing.pool import ThreadPool
my_params = ["Eu","JP","APAC"]
pool = ThreadPool(4) # match cpu cores here
pool.map(
lambda my_param: run_notebook("my_notebook", 3600, {"my_widget": my_param}),
my_params)
โ12-01-2021 03:44 AM
We implemented our code using threadpool
pool= ThreadPool(mp.cpu_count())
pool.map(fn_name,value_list)
My question is like how we can pass list type to notebook using widgets.. Currently we are taking string input and splitting it..
โ12-01-2021 03:51 AM
you can convert list to dataframe and register as table/view so it will be accessible from all notebooks
โ12-01-2021 03:57 AM
you could use a multiselect widget, here is another topic about that.
How to pass in the values in this widget? Can be done in several ways depending on what schedule tool you use.
I use Data Factory, where I define the values of what has to be sent to the notebook widget.
โ12-01-2021 04:02 AM
another another way ๐ (in databricks you can achieve everything many ways) is to encode list using json library:
import json
print type(json.dumps([1, 2, 3]))
#>> <type 'str'>
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group