cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to send a list as parameter in databricks notebook task

SailajaB
Valued Contributor III

Hi,

How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.

Thank you

8 REPLIES 8

-werners-
Esteemed Contributor III

Not sure what exactly you want to do, but you can use a widget to pass in parameters. These widgets can be of several types, including multiple values.

What I do not understand is to run the notebook in parallel. You want to run the same notebook in multiple jobs concurrently?

SailajaB
Valued Contributor III

Hi Werners,

I would like to pass a list of values to the databricks notebook i/p parameter(list type). For example ["Eu","JP","APAC"] and need to run my notebook transformations for each value of the list in parallel.

Note : Values of a list should be come from user

-werners-
Esteemed Contributor III

ok so passing in the values can be done with widgets.

But the notebook itself will run in parallel as it runs on spark.

So paralellism is already there.

If there is a reason you want to control the parallelism (which is the case I think), you will have to launch multiple instances of the notebook at once:

https://docs.databricks.com/notebooks/notebook-workflows.html

Hubert-Dudek
Esteemed Contributor III

Notebook code is executed on driver to achieve parallelism you need just to create Spark dataframe with your list.

As @Werner Stinckensโ€‹ said you can run multiple notebooks together also so in that case you will not use list just to every notebook pass 1 parameter from your list:

from multiprocessing.pool import ThreadPool
my_params =  ["Eu","JP","APAC"]
pool = ThreadPool(4) # match cpu cores here
pool.map(
    lambda my_param: run_notebook("my_notebook", 3600, {"my_widget": my_param}),
    my_params)

SailajaB
Valued Contributor III

We implemented our code using threadpool

pool= ThreadPool(mp.cpu_count())

pool.map(fn_name,value_list)

My question is like how we can pass list type to notebook using widgets.. Currently we are taking string input and splitting it..

Hubert-Dudek
Esteemed Contributor III

you can convert list to dataframe and register as table/view so it will be accessible from all notebooks

-werners-
Esteemed Contributor III

you could use a multiselect widget, here is another topic about that.

How to pass in the values in this widget? Can be done in several ways depending on what schedule tool you use.

I use Data Factory, where I define the values of what has to be sent to the notebook widget.

Hubert-Dudek
Esteemed Contributor III

another another way ๐Ÿ™‚ (in databricks you can achieve everything many ways) is to encode list using json library:

import json
print type(json.dumps([1, 2, 3]))
#>> <type 'str'>

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group