cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to send a list as parameter in databricks notebook task

SailajaB
Valued Contributor III

Hi,

How we can pass a list as parameter in data bricks notebook to run the notebook parallelly for list of values.

Thank you

9 REPLIES 9

Kaniz
Community Manager
Community Manager

Hi @ SailajaB! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

-werners-
Esteemed Contributor III

Not sure what exactly you want to do, but you can use a widget to pass in parameters. These widgets can be of several types, including multiple values.

What I do not understand is to run the notebook in parallel. You want to run the same notebook in multiple jobs concurrently?

SailajaB
Valued Contributor III

Hi Werners,

I would like to pass a list of values to the databricks notebook i/p parameter(list type). For example ["Eu","JP","APAC"] and need to run my notebook transformations for each value of the list in parallel.

Note : Values of a list should be come from user

-werners-
Esteemed Contributor III

ok so passing in the values can be done with widgets.

But the notebook itself will run in parallel as it runs on spark.

So paralellism is already there.

If there is a reason you want to control the parallelism (which is the case I think), you will have to launch multiple instances of the notebook at once:

https://docs.databricks.com/notebooks/notebook-workflows.html

Hubert-Dudek
Esteemed Contributor III

Notebook code is executed on driver to achieve parallelism you need just to create Spark dataframe with your list.

As @Werner Stinckens​ said you can run multiple notebooks together also so in that case you will not use list just to every notebook pass 1 parameter from your list:

from multiprocessing.pool import ThreadPool
my_params =  ["Eu","JP","APAC"]
pool = ThreadPool(4) # match cpu cores here
pool.map(
    lambda my_param: run_notebook("my_notebook", 3600, {"my_widget": my_param}),
    my_params)

SailajaB
Valued Contributor III

We implemented our code using threadpool

pool= ThreadPool(mp.cpu_count())

pool.map(fn_name,value_list)

My question is like how we can pass list type to notebook using widgets.. Currently we are taking string input and splitting it..

Hubert-Dudek
Esteemed Contributor III

you can convert list to dataframe and register as table/view so it will be accessible from all notebooks

-werners-
Esteemed Contributor III

you could use a multiselect widget, here is another topic about that.

How to pass in the values in this widget? Can be done in several ways depending on what schedule tool you use.

I use Data Factory, where I define the values of what has to be sent to the notebook widget.

Hubert-Dudek
Esteemed Contributor III

another another way 🙂 (in databricks you can achieve everything many ways) is to encode list using json library:

import json
print type(json.dumps([1, 2, 3]))
#>> <type 'str'>

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!